The documents distributed by this server have been provided by the
contributing authors as a means to ensure timely dissemination of
scholarly and technical work on a noncommercial basis. Copyright and all
rights therein are maintained by the authors or by other copyright
holders, notwithstanding that they have offered their works here
electronically. It is understood that all persons copying this
information will adhere to the terms and constraints invoked by each
author's copyright. These works may not be reposted without the explicit
permission of the copyright holder.
M. Martinasso, G. Kwasniewski, S. R. Alam, T. C. Shulthess, T. Hoefler:
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers
(In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), presented in Salt Lake City, Utah, pages 63:1--63:11, IEEE Press, ISBN: 978-1-4673-8815-3, Nov. 2016)
Abstract
MeteoSwiss, the Swiss national weather forecast institute, has selected
densely populated accelerator servers are their primary system to
compute weather forecast simulation.
Servers with multiple accelerator devices that are primarily connected
by a PCI-Express (PCIe) network achieve a significantly higher energy
efficiency.
Memory transfers between accelerators in such a system are subjected to
PCIe arbitration policies.
In this paper, we study the impact of PCIe topology and develop a
congestion-aware performance model for PCIe communication.
We present an algorithm for computing penalty coefficients of every
communication in a congestion graph that characterises the dynamic usage
of network resources by an application.
Our validation results on two different topologies of 8 GPU devices
demonstrate that our model achieves an accuracy of over 97% within the
PCIe network.
We use the model on a weather forecast application to identify the best
algorithm for its communication patterns among GPUs.
Documents
download article: download slides:
Recorded talk (best effort)
BibTeX
@inproceedings{, author={M. Martinasso and G. Kwasniewski and S. R. Alam and T. C. Shulthess and T. Hoefler}, title={{A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers}}, year={2016}, month={Nov.}, pages={63:1--63:11}, booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16)}, location={Salt Lake City, Utah}, publisher={IEEE Press}, isbn={978-1-4673-8815-3}, source={http://www.unixer.de/~htor/publications/}, }