Home Publications edited volumes Awards Research Teaching Miscellaneous Full CV [pdf]
Events

Past Events
|
Publications of Torsten Hoefler
Siyuan Shen, Tommaso Bonato, Zhiyi Hu, Pasquale Jordan, Tiancheng Chen, Torsten Hoefler:
| | | ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage
(In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'25), presented in St. Louis, MO, USA, Nov. 2025) Best Student Paper Finalist
AbstractNetwork simulators play a crucial role in evaluating the performance of large-scale systems. However, existing simulators rely heavily on synthetic microbenchmarks or narrowly focus on specific domains, limiting their ability to provide comprehensive performance insights. In this work, we introduce ATLAHS, a flexible, extensible, and open-source toolchain designed to trace real-world applications and accurately simulate their workloads. ATLAHS leverages the GOAL format to model communication and computation patterns in AI, HPC, and distributed storage applications. It supports multiple network simulation backends and handles multi-job and multi-tenant scenarios. Through extensive validation, we demonstrate that ATLAHS achieves high accuracy in simulating realistic workloads (consistently less than 5% error), while significantly outperforming AstraSim, the current state-of-the-art AI systems simulator, in terms of simulation runtime and trace size efficiency. We further illustrate ATLAHS's utility via detailed case studies, highlighting the impact of congestion control algorithms on the performance of distributed storage systems, as well as the influence of job-placement strategies on application runtimes.
Documentsdownload article:  download slides:  | | | BibTeX | @inproceedings{shen2025atlahs, author={Siyuan Shen and Tommaso Bonato and Zhiyi Hu and Pasquale Jordan and Tiancheng Chen and Torsten Hoefler}, title={{ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage}}, year={2025}, month={Nov.}, booktitle={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'25)}, location={St. Louis, MO, USA}, source={http://www.unixer.de/~htor/publications/}, } |
|
|