Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej Besta, Luca Benini, Duncan Roweth, Torsten Hoefler:
(In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019)
Abstract
Applications often communicate data that is non-contiguous in the send- or the
receive-buffer, e.g., when exchanging a column of a matrix stored in row-major
order. While non-contiguous transfers are well supported in HPC (e.g., MPI
derived datatypes), they can still be up to 5x slower than contiguous transfers
of the same size. As we enter the era of network acceleration, we need to
investigate which tasks to offload to the NIC: In this work we argue that
non-contiguous memory transfers can be transparently network-accelerated, truly
achieving zero-copy communications. We implement and extend sPIN, a packet
streaming processor, within a Portals 4 NIC SST model, and evaluate strategies
for NIC-offloaded processing of MPI datatypes, ranging from datatype-specific
handlers to general solutions for any MPI datatype. We demonstrate up to 8x
speedup in the unpack throughput of real applications, demonstrating that
non-contiguous memory transfers are a first-class candidate for network
acceleration.
Documents
download article: download slides:
Recorded talk (best effort)
BibTeX
@inproceedings{, author={Salvatore Di Girolamo and Konstantin Taranov and Andreas Kurth and Michael Schaffner and Timo Schneider and Jakub Beránek and Maciej Besta and Luca Benini and Duncan Roweth and Torsten Hoefler}, title={{Network-Accelerated Non-Contiguous Memory Transfers}}, year={2019}, month={Nov.}, booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19)}, source={http://www.unixer.de/~htor/publications/}, }