Omnia vincit amor
Home -> Publications
Home
  Publications
    
all years
    2017
    2016
    2015
    2014
    2013
    2012
    2011
    2010
    2009
    2008
    2007
    2006
    2005
    2004
    theses
    techreports
    presentations
    edited volumes
    conferences
  Awards
  Research
  Teaching
  BLOG
  Miscellaneous
  Full CV [pdf]






  Events








  Past Events





Publications of Torsten Hoefler
Copyright Notice:

The documents distributed by this server have been provided by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a noncommercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.

T. Schneider, R. Gerstenberger, T. Hoefler:

 Compiler Optimizations for Non-Contiguous Remote Data Movement

(presented in Santa Clara, CA, USA, Sep. 2013, Proceedings of the 26th International Workshop on Languages and Compilers for Parallel Computing )

Abstract

Remote Memory Access (RMA) programming is one of the core concepts behind modern parallel programming languages such as UPC and Fortran 2008 or high-performance libraries such as MPI-3 One Sided or SHMEM. Many applications have to communicate non-contiguous data due to their data layout in main memory. Previous studies showed that such non-contiguous transfers can reduce communication performance by up to an order of magnitude. In this work, we demonstrate a simple scheme for statically optimizing non-contiguous RMA transfers by combining partial packing, communication overlap, and remote access pipelining. We determine accurate performance models for the various operations to find near-optimal pipeline parameters. The proposed approach is applicable to all RMA languages and does not depend on the availability of special hardware features such as scatter-gather lists or strided copies. We show that our proposed superpipelining leads to significant improvements compared to either full packing or sending each contiguous segment individually. We outline how our approach can be used to optimize non-contiguous data transfers in PGAS programs automatically. We observed a 37\% performance gain over the fastest of either packing or individual sending for a realistic application.

Documents

download article:
download slides:
 

BibTeX

@inproceedings{schneider-compopts-noncont-lcpc13,
  author={T. Schneider and R. Gerstenberger and T. Hoefler},
  title={{Compiler Optimizations for Non-Contiguous Remote Data Movement}},
  year={2013},
  month={Sep.},
  location={Santa Clara, CA, USA},
  note={Proceedings of the 26th International Workshop on Languages and Compilers for Parallel Computing},
  source={http://www.unixer.de/~htor/publications/},
}

serving: 54.146.176.30:50241© Torsten Hoefler