Felix, qui, quod amat, defendere fortiter audet
Home -> Publications
Home
  Publications
    
all years
    2019
    2018
    2017
    2016
    2015
    2014
    2013
    2012
    2011
    2010
    2009
    2008
    2007
    2006
    2005
    2004
    theses
    techreports
    presentations
    edited volumes
    conferences
  Awards
  Research
  Teaching
  BLOG
  Miscellaneous
  Full CV [pdf]






  Events








  Past Events





Publications of Torsten Hoefler
Copyright Notice:

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

T. Hoefler:

 RDMA, Scalable MPI-3 RMA, and Next-Generation Post-RDMA Interconnects

(Presentation - Apr. 2019)
Best talk award winner at Swiss HPC Advisory Council Conference 2019

Abstract

Modern interconnects offer remote direct memory access (RDMA) features. Yet, most applications rely on explicit message passing for communications albeit their unwanted overheads. The MPI-3.0 standard defines a programming interface for exploiting RDMA networks directly. We demonstrate how to efficiently implement the specification on modern RDMA networks. Our protocols support scaling to millions of cores with negligible memory consumption while providing highest performance and minimal overheads, comparable to, or better than UPC and CAF in terms of latency, bandwidth, and message rate. After this, we recognize that network cards contain rather powerful processors optimized for data movement and limiting the functionality to remote direct memory access seems unnecessarily constraining. We develop sPIN, a portable programming model to offload simple packet processing functions to the network card. To demonstrate the potential of the model, we design a cycle-accurate simulation environment by combining the network simulator LogGOPSim and the CPU simulator gem5. We implement offloaded message matching, datatype processing, and collective communications and demonstrate transparent full-application speedups. Furthermore, we show how sPIN can be used to accelerate redundant in-memory filesystems and several other use cases. Our work investigates a portable packet-processing network acceleration model similar to compute acceleration with CUDA or OpenCL. We show how such network acceleration enables an eco-system that can significantly speed up applications and system services.

Documents

download slides:
 

BibTeX

@misc{spin-hpcac,
  author={T. Hoefler},
  title={{RDMA, Scalable MPI-3 RMA, and Next-Generation Post-RDMA Interconnects}},
  year={2019},
  month={Apr.},
  source={http://www.unixer.de/~htor/publications/},
}

serving: 54.147.29.160:47628© Torsten Hoefler