Distributed memory programming is the established paradigm used in high-performance computing (HPC) systems, requiring explicit communication
between nodes and devices. When FPGAs are deployed in distributed settings, communication is typically handled either by going through the host machine, sacrificing performance, or
by streaming across fixed device-to-device connections, sacrificing flexibility.
We present Streaming~Message~Interface~(SMI), a communication model and API that unifies explicit message passing with a hardware-oriented programming
model, facilitating minimal-overhead, flexible, and productive inter-FPGA communication.
Instead of bulk transmission, messages are streamed across the network during computation, allowing communication to be seamlessly integrated into pipelined
designs. We present a high-level synthesis implementation of SMI targeting a dedicated FPGA interconnect, exposing runtime-configurable routing with support for
arbitrary network topologies, and implement a set of distributed memory benchmarks. Using SMI, programmers can implement distributed, scalable HPC programs on
reconfigurable hardware, without deviating from best practices for hardware design.
@inproceedings{, author={Tiziano De Matteis and Johannes de Fine Licht and Jakub Beránek and Torsten Hoefler}, title={{Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware}}, year={2019}, month={Nov.}, booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19)}, source={http://www.unixer.de/~htor/publications/}, }