Home Publications edited volumes Awards Research Teaching Miscellaneous Full CV [pdf] BLOG
Events
Past Events
|
Publications of Torsten Hoefler
Torsten Hoefler:
| | Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks
(presented in Chemnitz, Germany, Apr. 2005) TU Chemnitz Best Student Award, 2005
AbstractThe MPI Barrier-collective operation, as a part of the MPI-1.1 standard, is extremely important for all parallel applications using it. The latency of this operation increases the
application run time and can not be overlaid. Thus, the whole MPI performance can be
decreased by unsatisfactory barrier latency. The main goals of this work are to lower the
barrier latency for InfiniBandTM networks by analyzing well known barrier algorithms with
regards to their suitability within InfiniBandTM networks, to enhance the barrier operation
by utilizing standard InfiniBandTM operations as much as possible, and to design a constant
time barrier for InfiniBandTM with special hardware support. This partition into three main
steps is retained throughout the whole thesis. The first part evaluates publicly known models
and proposes a new more accurate model (LoP) for InfiniBandTM . All barrier algorithms
are evaluated within the well known LogP and this new model. Two new algorithms which
promise a better performance have been developed. A constant time barrier integrated into
InfiniBandTM as well as a cheap separate barrier network is proposed in the hardware section.
All results have been implemented inside the Open MPI framework. This work led to three
new Open MPI collective modules. The first one implements different barrier algorithms
which are dynamically benchmarked and selected during the startup phase to maximize the
performance. The second one offers a special barrier implementation for InfiniBandTM with
RDMA and performs up to 40% better than the best solution that has been published so far.
The third implementation offers a constant time barrier in a separate network, leveraging
commodity components, with a latency of only 2.5µs. All components have their specialty
and can be used to enhance the barrier performance significantly.
Documents
| | BibTeX | @masterthesis{hoefler-thesis-05, author={Torsten Hoefler}, title={{Evaluation of publicly available Barrier-Algorithms and Improvement of the Barrier-Operation for large-scale Cluster-Systems with special Attention on InfiniBand Networks}}, institution={Technical University of Chemnitz}, year={2005}, month={Apr.}, location={Chemnitz, Germany}, source={http://www.unixer.de/~htor/publications/}, } |
|
|