The documents distributed by this server have been provided by the
contributing authors as a means to ensure timely dissemination of
scholarly and technical work on a noncommercial basis. Copyright and all
rights therein are maintained by the authors or by other copyright
holders, notwithstanding that they have offered their works here
electronically. It is understood that all persons copying this
information will adhere to the terms and constraints invoked by each
author's copyright. These works may not be reposted without the explicit
permission of the copyright holder.
[2] Siyuan Shen, Langwen Huang, Marcin Chrapek, Timo Schneider, Jai Dayal, Manisha Gajbe, Robert Wisniewski, Torsten Hoefler:
LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'24), presented in Atlanta, GA, USA, pages 1004-1021, IEEE Press, ISBN: 979-8-3503-5291-7, Nov. 2024, (acceptance rate 21.1%, 99/470) SC'24 Best Paper Award (1/99)
SC'24
[3] Patrik Okanovic, Grzegorz Kwasniewski, Paolo Sylos Labini, Maciej Besta, Flavio Vella, Torsten Hoefler:
High Performance Unstructured SpMM Computation Using Tensor Cores In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'24), presented in Atlanta, GA, USA, pages 154:1-154:14, IEEE Press, ISBN: 979-8-3503-5291-7, Nov. 2024, (acceptance rate 21.1%, 99/470)
EMNLP'24
[4] Saleh Ashkboos, Ilia Markov, Elias Frantar, Tingxuan Zhong, Xincheng Wang, Jie Ren, Torsten Hoefler, Dan Alistarh:
[6] Maciej Besta, Robert Gerstenberger, Patrick Iff, Pournima Sonawane, Juan Gómez Luna, Raghavendra Kanakagiri, Rui Min, Onur Mutlu, Torsten Hoefler, Raja Appuswamy, Aidan O Mahony:
[7] Mikhail Khalilov, Marcin Chrapek, Siyuan Shen, Alessandro Vezzu, Thomas Benz, Salvatore Di Girolamo, Timo Schneider, Daniele De Sensi, Luca Benini, Torsten Hoefler:
[9] Maciej Besta, Ales Kubicek, Roman Niggli, Robert Gerstenberger, Lucas Weitzendorf, Mingyuan Chi, Patrick Iff, Joanna Gajda, Piotr Nyczyk, Jürgen Müller, Hubert Niewiadomski, Marcin Chrapek, Michał Podstawski, Torsten Hoefler:
[11] Kartik Lakhotia, Laura Monroe, Kelly Isham, Maciej Besta, Nils Blach, Torsten Hoefler, Fabrizio Petrini:
PolarStar: Expanding the Horizon of Diameter-3 Networks In Proceedings of the 36th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'24), presented in Nantes, France, pages 345–357, Association for Computing Machinery, ISBN: 9798400704161, Jun. 2024, (acceptance rate 29.9%, 35/117)
arXiv
[12] Maciej Besta, Lorenzo Paleari, Ales Kubicek, Piotr Nyczyk, Robert Gerstenberger, Patrick Iff, Tomasz Lehmann, Hubert Niewiadomski, Torsten Hoefler:
[13] Yves Baumann, Tal Ben-Nun, Maciej Besta, Lukas Gianinazzi, Torsten Hoefler, Piotr Luczynski:
Low-Depth Spatial Tree Algorithms In Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium (IPDPS'24), presented in San Francisco, CA, USA, pages 180-192, IEEE Press, May 2024, (acceptance rate 26.1%, 88/337)
[15] Tim Dettmers, Ruslan A. Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, Dan Alistarh:
[18] Piotr Luczynski, Lukas Gianinazzi, Patrick Iff, Leighton Wilson, Daniele De Sensi, Torsten Hoefler:
Near-Optimal Wafer-Scale Reduce In Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing (HPDC'24), presented in Pisa, Italy, Association for Computing Machinery, May 2024,
[20] Daniele De Sensi, Tommaso Bonato, David Saam, Torsten Hoefler:
Swing: Short-cutting Rings for Higher Bandwidth Allreduce In 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI '24), presented in Santa Clara, CA, USA, pages 1445-1462, USENIX Association, ISBN: 978-1-939133-39-7, Apr. 2024,
ESSD
[21] Bjorn Stevens et al.:
Earth Virtualization Engines (EVE) Earth System Science Data (ESSD). Vol 16, Nr. 4, pages 2113-2122, Copernicus Publications, Apr. 2024,
NSDI'24
[22] Nils Blach, Maciej Besta, Daniele De Sensi, Jens Domke, Hussein Harake, Shigang Li, Patrick Iff, Marek Konieczny, Kartik Lakhotia, Ales Kubicek, Marcel Ferrari, Fabrizio Petrini, Torsten Hoefler:
[23] Torsten Hoefler, Marcin Copik, Pete Beckman, Andrew Jones, Ian Foster, Manish Parashar, Daniel Reed, Matthias Troyer, Thomas Schulthess, Dan Ernst, Jack Dongarra:
[26] Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michał Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, Torsten Hoefler:
[27] Maciej Besta, Florim Memedi, Zhenyu Zhang, Robert Gerstenberger, Guangyuan Piao, Nils Blach, Piotr Nyczyk, Marcin Copik, Grzegorz Kwaśniewski, Jürgen Müller, Lukas Gianinazzi, Ales Kubicek, Hubert Niewiadomski, Aidan O'Mahony, Onur Mutlu, Torsten Hoefler:
[30] Philipp Schaad, Timo Schneider, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Alexandru Calotoiu, Torsten Hoefler:
FuzzyFlow: Leveraging Dataflow To Find and Squash Program Optimization Bugs In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376)
LOG'23
[31] Maciej Besta, Afonso Claudino Catarino, Lukas Gianinazzi, Nils Blach, Piotr Nyczyk, Hubert Niewiadomski, Torsten Hoefler:
HEAR: Homomorphically Encrypted Allreduce In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), presented in Denver, CO, USA, Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376) SC23 Best student paper, SC23 Reproducibility Advancement Award
SC'23
[33] Roberto L. Castro, Andrei Ivanov, Diego Andrade, Tal Ben-Nun, Basilio B. Fraguela, Torsten Hoefler:
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), presented in Denver, CO, USA, Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376)
SC'23
[34] Maciej Besta, Robert Gerstenberger, Marc Fischer, Michał Podstawski, Nils Blach, Berke Egeli, Georgy Mitenkov, Wojciech Chlapek, Marek Michalewicz, Hubert Niewiadomski, Jürgen Müller, Torsten Hoefler:
[36] Wenqi Jiang, Shigang Li, Yu Zhu, Johannes de Fine Licht, Zhenhao He, Runbin Shi, Cedric Renggli, Shuai Zhang, Theodoros Rekatsinas, Torsten Hoefler, Gustavo Alonso:
Co-design Hardware and Algorithm for Vector Search In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'23), presented in Denver, CO, USA, Association for Computing Machinery, ISBN: 979-8-400701-09-2, Nov. 2023, (acceptance rate 23.9%, 90/376)
arXiv
[37] Patrick Iff, Benigna Bruggmann, Maciej Besta, Luca Benini, Torsten Hoefler:
[45] Torsten Hoefler, Duncan Roweth, Keith Underwood, Robert Alverson, Mark Griswold, Vahid Tabatabaee, Mohan Kalkunte, Surendra Anubolu, Siyuan Shen, Moray McLaren, Abdul Kabbani, Steve Scott:
[56] Kartik Lakhotia, Kelly Isham, Laura Monroe, Maciej Besta, Torsten Hoefler, Fabrizio Petrini:
In-network Allreduce with Multiple Spanning Trees on PolarFly In Proceedings of the 35th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'23), presented in Orlando, FL, USA, pages 165–176, Association for Computing Machinery, ISBN: 9781450395458, Jun. 2023,
ICLR'23
[57] Elias Frantar, Saleh Ashkboos, Torsten Hoefler, Dan Alistarh:
[61] Torsten Hoefler, Bjorn Stevens, Andreas F. Prein, Johanna Baehr, Thomas Schulthess, Thomas F. Stocker, John Taylor, Daniel Klocke, Pekka Manninen, Piers M. Forster, Tobias Kölling, Nicolas Gruber, Hartwig Anzt, Claudia Frauen, Florian Ziemen, Milan Klöwer, Karthik Kashinath, Christoph Schär, Oliver Fuhrer, Bryan N. Lawrence:
[70] Tal Ben-Nun, Linus Groner, Florian Deconinck, Tobias Wicky, Eddie Davis, Johann Dahm, Oliver Elbert, Rhea George, Jeremy McGibbon, Lukas Trümper, Elynn Wu, Oliver Fuhrer, Thomas Schulthess, Torsten Hoefler:
[72] Maciej Besta, Cesare Miglioli, Paolo Sylos Labini, Jakub Tětek, Patrick Iff, Raghavendra Kanakagiri, Saleh Ashkboos, Kacper Janda, Michal Podstawski, Grzegorz Kwasniewski, Niels Gleinig, Flavio Vella, Onur Mutlu, Torsten Hoefler:
[74] Torsten Hoefler, Tommaso Bonato, Daniele De Sensi, Salvatore Di Girolamo, Shigang Li, Marco Heddes, Jon Belk, Deepak Goel, Miguel Castro,Steve Scott:
HammingMesh: A Network Topology for Large-Scale Deep Learning In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'22), Nov. 2022, SC22 Reproducibility Advancement Award and Invited as CACM Research Highlight
arXiv
[75] Michael E Beverland, Prakash Murali, Matthias Troyer, Krysta M Svore, Torsten Hoefler, Vadym Kliuchnikov, Guang Hao Low, Mathias Soeken, Aarthi Sundaram, Alexander Vaschillo:
[80] Maciej Besta, Raphael Grob, Cesare Miglioli, Nicola Bernold, Grzegorz Kwasniewski, Gabriel Gjini, Raghavendra Kanakagiri, Saleh Ashkboos, Lukas Gianinazzi, Nikoli Dryden, Torsten Hoefler:
Motif Prediction with Graph Neural Networks In Proceedings of the 28th SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'22), presented in Washington DC, USA, pages 35–45, Association for Computing Machinery, ISBN: 9781450393850, Aug. 2022,
[104] Thomas Häner, Damian S. Steiger, Torsten Hoefler, Matthias Troyer:
Distributed Quantum Computing with QMPI In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), Nov. 2021, (acceptance rate 25.9%, 98/379)
SC21
[105] Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, Torsten Hoefler:
Flare: Flexible In-Network Allreduce In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), presented in St. Louis, Missouri, ACM, Nov. 2021, (acceptance rate 25.9%, 98/379)
SC21
[106] Nikoli Dryden, Roman Böhringer, Tal Ben-Nun, Torsten Hoefler:
Clairvoyant Prefetching for Distributed Machine Learning I/O In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21), presented in St. Louis, Missouri, ACM, Nov. 2021, (acceptance rate 25.9%, 98/379)
SC21
[107] Alexandros Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, Torsten Hoefler:
[108] Grzegorz Kwasniewski, Marko Kabić, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, Maciej Besta, Anton Kozhevnikov, Joost VandeVondele, Torsten Hoefler:
[109] Arjun Pitchanathan, Christian Ulmann, Michel Weber, Torsten Hoefler, Tobias Grosser:
FPL: fast Presburger arithmetic through transprecision OOPSLA '21: Proceedings of the ACM international conference on Object oriented programming systems languages and applications. ACM, Nov. 2021, OOPSLA distinguished paper award (6/71)
MICRO'21
[110] Maciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Beránek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan Gómez Luna, Marcin Copik, Lukas Kapp-Schwoerer, Salvatore Di Girolamo, Nils Blach, Marek Konieczny, Onur Mutlu, Torsten Hoefler:
[112] Maciej Besta, Zur Vonarburg-Shmaria, Yannick Schaffner, Leonardo Schwarz, Grzegorz Kwasniewski, Lukas Gianinazzi, Jakub Beranek, Kacper Janda, Tobias Holenstein, Sebastian Leisinger, Peter Tatkowski, Esref Ozdemir, Adrian Balla, Marcin Copik, Philipp Lindenberger, Pavel Kalvoda, Marek Konieczny, Onur Mutlu, Torsten Hoefler:
[117] Grzegorz Kwasniewski, Tal Ben-Nun, Lukas Gianinazzi, Alexandru Calotoiu, Timo Schneider, Alexandros Nikolaos Ziogas, Maciej Besta, Torsten Hoefler:
[137] Tiziano De Matteis and Johannes de Fine Licht and Torsten Hoefler:
FBLAS: Streaming Linear Algebra on FPGA In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), IEEE Press, ISBN: 9781728199986, Nov. 2020, (acceptance rate: 25.1% (95/378))
[139] Daniele De Sensi and Salvatore Di Girolamo and Kim H. McMahon and Duncan Roweth and Torsten Hoefler:
An In-Depth Analysis of the Slingshot Interconnect In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20), Nov. 2020, (acceptance rate: 25.1% (95/378))
Scientific Benchmarking of Parallel Computing Systems(Presentation) presented in virtual, Nov. 2020, Keynote talk at the 2020 BenchCouncil International Symposium on Benchmarking, Measuring and Optimizing (Bench'20)
SC20
[141] Maciej Besta and Marcel Schneider and Marek Konieczny and Karolina Cynk and Erik Henriksson and Salvatore Di Girolamo and Ankit Singla and Torsten Hoefler:
[142] Alexandru Calotoiu and Markus Geisenhofer and Florian Kummer and Marcus Ritter and Jens Weber and Torsten Hoefler and Martin Oberlack and Felix Wolf:
General in-network processing - time is ripe!(Presentation) presented in hybrid/virtual, Oct. 2020, Keynote talk at the High-performance Interconnects Forum (in conjunction with HPC China 2020)
arXiv
[148] Grzegorz Kwasniewski, Tal Ben-Nun, Alexandros Nikolaos Ziogas, Timo Schneider, Maciej Besta, Torsten Hoefler:
High-Performance Communication in Machine Learning (Presentation) presented in virtual, Jun. 2020, Keynote talk at the 2020 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS 2020)
CVPR'20
[158] Elad Hoffer, Tal Ben-Nun, Itay Hubara, Niv Giladi, Torsten Hoefler, Daniel Soudry:
[168] Christoph Schär, Oliver Fuhrer, Andrea Arteaga, Nikolina Ban, Christophe Charpilloz, Salvatore Di Girolamo, Laureline Hentgen, Torsten Hoefler, Xavier Lapillonne, David Leutwyler, Katherine Osterried, Davide Panosetti, Stefan Rüdisühli, Linda Schlemmer, Thomas Schulthess, Michael Sprenger, Stefano Ubbiali, Heini Wernli:
[169] Peter Grönquist, Tal Ben-Nun, Nikoli Dryden, Peter Dueben, Luca Lavarini, Shigang Li, Torsten Hoefler:
Predicting Weather Uncertainty with Deep Convnets In Machine Learning and the Physical Sciences Workshop at the 33rd Conference on Neural Information Processing Systems (NeurIPS), presented in Vancouver, BC, Canada, Dec. 2019,
SC19
[170] Cedric Renggli, Dan Alistarh, Mehdi Aghagolzadeh, Torsten Hoefler:
[174] Salvatore Di Girolamo, Konstantin Taranov, Andreas Kurth, Michael Schaffner, Timo Schneider, Jakub Beránek, Maciej Besta, Luca Benini, Duncan Roweth, Torsten Hoefler:
Network-Accelerated Non-Contiguous Memory Transfers In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC19), Nov. 2019, (acceptance rate: 22.7% (78/344))
SC19
[175] Daniele De Sensi, Salvatore Di Girolamo, Torsten Hoefler:
hlslib: Software Engineering for Hardware Design In Fifth International Workshop on Heterogeneous High-performance Reconfigurable Computing (H2RC'19), presented in Denver, CO, United States, IEEE, Nov. 2019,
Data-Centric Parallel Programming(Presentation) presented in Prague, Czech Republic, Sep. 2019, Keynote talk at the The 18th International Parallel Computing conference (ParCo'19)
High-Performance Communication in Machine Learning(Presentation) presented in Bialystok, Poland, Sep. 2019, Keynote talk at the 13th International Conference on Parallel Processing and Applied Mathematics (PPAM'19)
The Green Graph500 List (June 2019)(Presentation) presented in Frankfurt, Germany, Jun. 2019, Presented at the Green Graph 500 BoF at the International Conference on Supercomputing (ISC'19)
ISC'19
[191] Torsten Hoefler, Alexandros Nikolaos Ziogas, Tal Ben-Nun, Guillermo Indalecio Fernández, Timo Schneider, Mathieu Luisier and Johannes de Fine Licht:
Data-Centric Parallel Programming(Presentation) presented in Frankfurt, Germany, Jun. 2019, invited talk at the International Conference on Supercomputing (ISC'19)
[193] Tobias Gysi, Tobias Grosser, L. Brandner, Torsten Hoefler:
A Fast Analytical Model of Fully Associative Caches In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation, presented in Phoenix, AZ, USA, pages 816--829, ACM, ISBN: 978-1-4503-6712-7, Jun. 2019,
DAC'19
[194] Niels Gleinig and Frances Ann Hubis and Torsten Hoefler:
[195] Felix Thaler, Stefan Moosbrugger, Carlos Osuna, Mauro Bianco, Hannes Vogt, Anton Afanasyev, Lukas Mosimann, Oliver Fuhrer, Thomas Schulthess, Torsten Hoefler:
Porting the COSMO Weather Model to Intel KNL presented in Zurich, Switzerland, ACM, Jun. 2019, Accepted at the ACM Platform for Advanced Scientific Computing Conference (PASC19)
IPDPS'19
[196] Salvatore Di Girolamo, P. Schmid, Thomas Schulthess, Torsten Hoefler:
Extreme-Scale Graphs(Presentation) presented in Warsaw, Poland, Mar. 2019, Invited talk at Supercomputing Frontiers Europe 2019
FPGA'19
[202] Maciej Besta, Marc Fischer, Tal Ben-Nun, Johannes de Fine Licht, Torsten Hoefler:
Substream-Centric Maximum Matchings on FPGA Feb. 2019, In Proceedings of the 27th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (acceptance rate: 23%) Best Paper Finalist (4/30)
PPoPP'19
[203] Martin Kuettler, Maksym Planeta, Jan Bierbaum, Carsten Weinhold, Hermann Haertig, Amnon Barak, Torsten Hoefler:
[223] Alexandru Calotoiu, Alexander Graf, Torsten Hoefler, Daniel Lorenz, Sebastian Rinke, Felix Wolf:
Lightweight Requirements Engineering for Exascale Co-design In {IEEE} International Conference on Cluster Computing, {CLUSTER} 2018, Belfast, UK, September 10-13, 2018, presented in Belfast, UK, IEEE, ISBN: 978-1-5386-8319-4, Sep. 2018, (28% (44/154))
Cluster'18
[224] Y. Oyama, Tal Ben-Nun, Torsten Hoefler, Satoshi Matsuoka:
Accelerating Deep Learning Frameworks with Micro-batches In {IEEE} International Conference on Cluster Computing, {CLUSTER} 2018, Belfast, UK, September 10-13, 2018, presented in Belfast, UK, IEEE, ISBN: 978-1-5386-8319-4, Sep. 2018, (28% (44/154))
[228] Oliver Fuhrer, T. Chadha, Torsten Hoefler, Grzegorz Kwasniewski, X. Lapillonne, D. Leutwyler, D. Luethi, Carlos Osuna, C. Schaer, Thomas Schulthess, Hannes Vogt:
[240] Lukas Gianinazzi, Pavel Kalvoda, Alessandro De Palma, Maciej Besta, Torsten Hoefler:
Communication-Avoiding Parallel Minimum Cuts and Connected Components In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, presented in Vienna, Austria, pages 219-232, ACM, ISBN: 978-1-4503-4982-6, Feb. 2018, (acceptance rate: 20% (28/138))
HPCDC
[241] Torsten Hoefler, Sabela Ramos, Carlos Osuna, Felix Thaler, S. Moosbrugger, Oliver Fuhrer:
[242] Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, R. E. Grant, Ron Brightwell:
sPIN: High-performance streaming Processing in the Network In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC17), Nov. 2017, (acceptance rate: 18% (61/327)) Best Paper Finalist at SC17 (5/61)
SC17
[243] Edgar Solomonik, Maciej Besta, F. Vella, Torsten Hoefler:
[244] Didem Unat, Anshu Dubey, Torsten Hoefler, John Shalf, Mark Abraham, Mauro Bianco, Bradford L. Chamberlain, Romain Cledat, H. Carter Edwards, Hal Finkel, Karl Fuerlinger, Frank Hannig, Emmanuel Jeannot, Amir Kamil, Jeff Keasler, Paul H J Kelly, Vitus Leung, Hatem Ltaief, Naoya Maruyama, Chris J. Newburn, Miquel Pericas:
Transparent Caching for RMA Systems In Proceedings of the 31st IEEE International Parallel & Distributed Processing Symposium (IPDPS'17), presented in Orlando, FL, USA, IEEE, May 2017, (acceptance rate: 22%, 116/516)
[274] M. Martinasso, Grzegorz Kwasniewski, S. R. Alam, Thomas Schulthess, Torsten Hoefler:
A PCIe Congestion-Aware Performance Model for Densely Populated Accelerator Servers In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), presented in Salt Lake City, Utah, pages 63:1--63:11, IEEE Press, ISBN: 978-1-4673-8815-3, Nov. 2016, (acceptance rate: 18% (82/446))
SC16
[275] W. Tang, B. Wang, S. Ethier, Grzegorz Kwasniewski, Torsten Hoefler, K. Z. Ibrahim, K. Madduri, S. Williams, Leonid Oliker, C. Rosales-Fernandez, T. Williams:
Extreme Scale Plasma Turbulence Simulations on Top Supercomputers Worldwide In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), presented in Salt Lake City, Utah, pages 43:1--43:12, IEEE Press, ISBN: 978-1-4673-8815-3, Nov. 2016, (acceptance rate: 18% (82/446))
Scheduling-Aware Routing for Supercomputers Nov. 2016, Accepted at The International Conference for High Performance Computing, Networking, Storage and Analysis (SC'16) (acceptance rate: 18% (82/446))
dCUDA: Hardware Supported Overlap of Computation and Communication In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC16), presented in Salt Lake City, Utah, pages 52:1--52:12, IEEE Press, ISBN: 978-1-4673-8815-3, Nov. 2016, (acceptance rate: 18% (82/446))
OOPSLA'16
[278] Andrei Marian Dan, Patrick Lam, Torsten Hoefler, Martin Vechev:
Modeling and Analysis of Remote Memory Access Programming In Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications, presented in Amsterdam, Netherlands, pages 129--144, ACM, ISBN: 978-1-4503-4444-9, Nov. 2016, Outstanding Paper Award at OOPSLA'16 (4/52)
High-Performance Distributed RMA Locks In Proceedings of the 25th Symposium on High-Performance Parallel and Distributed Computing (HPDC'16), Jun. 2016, (acceptance rate: 16% (20/129)) Karsten Schwan Best Paper Award at HPDC'16 (1/20)
[308] G. Kathareios, C. Minkenberg, B. Prisacari, G. Rodriguez, Torsten Hoefler:
Cost-Effective Diameter-Two Topologies: Analysis and Evaluation presented in Austin, TX, USA, ACM, ISBN: 978-1-4503-3723-6, Nov. 2015, In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) (acceptance rate: 22%, 79/358)
Scientific Benchmarking of Parallel Computing Systems presented in Austin, TX, USA, pages 73:1--73:12, ACM, ISBN: 978-1-4503-3723-6, Nov. 2015, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC15) (acceptance rate: 22%, 79/358)
[312] Salvatore Di Girolamo, P. Jolivet, K. D. Underwood, Torsten Hoefler:
Exploiting Offload Enabled Network Interfaces In Proceedings of the 23rd Annual Symposium on High-Performance Interconnects (HOTI'15), presented in Oracle Santa Clara Campus, CA, USA, IEEE, Aug. 2015, Best Student Paper at HOTI'15
Cache Line Aware Optimizations for ccNUMA Systems In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'15) (short paper), presented in Portland, OR, USA, pages 85--88, ACM, ISBN: 978-1-4503-3550-8, Jun. 2015,
DARE: High-Performance State Machine Replication on RDMA Networks In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC'15), presented in Portland, OR, USA, pages 107--118, ACM, ISBN: 978-1-4503-3550-8, Jun. 2015, (acceptance rate: 16% (19/116))
Distributing the Data Plane for Remote Storage Access presented in Kartause Ittingen, Switzerland, USENIX, May 2015, Proceedings of the 15th Workshop on Hot Topics in Operating Systems (acceptance rate: 32% (29/90))
[330] T. Lee, C. Pappas, C. Basescu, J. Han, Torsten Hoefler, A. Perrig:
Source-Based Path Selection: The Data Plane Perspective In Proceedings of the 10th International Conference on Future Internet, presented in Seoul, Republic of Korea, pages 41--45, ACM, ISBN: 978-1-4503-3564-5, May 2015,
Slim Fly: A Cost Effective Low-Diameter Network Topology presented in New Orleans, LA, USA, Nov. 2014, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC14) (acceptance rate: 21%, 82/394) SC14 Best Student Paper (1/82)
Adv MPI
[339] William Gropp, Torsten Hoefler, Rajeev Thakur, E. Lusk:
A case for runtime recompilation in HPC(Presentation) presented in New Orleans, Louisiana, USA, Nov. 2014, The LLVM Compiler Infrastructure in HPC, Keynote Presentation
Fault Tolerance for Remote Memory Access Programming Models In Proceedings of the 23rd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'14), presented in Vancouver, Canada, ACM, Jun. 2014, (acceptance rate: 16%, 21/130) Best Paper Nominee at HPDC'14 (3/21)
Automatic Complexity Analysis of Explicitly Parallel Programs In Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'14), presented in Prague, Czech Republic, ACM, Jun. 2014, (acceptance rate: 25%, 30/122)
HPDC'14
[350] B. Prisacari, G. Rodriguez, P. Heidelberger, D. Chen, C. Minkenberg, Torsten Hoefler:
Efficient Task Placement and Routing in Dragonfly Networks In Proceedings of the 23rd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC'14), presented in Vancouver, Canada, ACM, Jun. 2014, (acceptance rate: 16%, 21/130)
Designing Bit-Reproducible Portable High-Performance Applications In Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS), presented in Phoenix, AR, USA, IEEE Computer Society, Apr. 2014, (acceptance rate: 21.1%, 114/541)
[357] Felix Wolf, Christian Bischof, Torsten Hoefler, Bernd Mohr, Gabriel Wittum, Alexandru Calotoiu, Christian Iwainsky, Alexandre Strube, Andreas Vogel:
[358] B. Prisacari, G. Rodriguez, C. Minkenberg, Torsten Hoefler:
Fast Pattern-Specific Routing for Fat Tree Networks ACM Transactions on Architecture and Code Optimization. Vol 10, Nr. 4, presented in New York, NY, USA, pages 36:1--36:25, ACM, ISSN: 1544-3566, Dec. 2013, (acceptance rate: 24% (2011))
Hybrid MPI: Efficient Message Passing for Multi-core Systems In IEEE/ACM International Conference on High Performance Computing, Networking, Storage and Analysis (SC13), presented in Denver, Colorado, USA, pages 18:1--18:11, ISBN: 978-1-4503-2378-9, Nov. 2013, (acceptance rate: 20%, 92/457)
Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, presented in Denver, Colorado, USA, pages 53:1--53:12, ACM, ISBN: 978-1-4503-2378-9, Nov. 2013, (acceptance rate: 20%, 92/457) Best Student Paper Finalist (8/92) and SC13 Best Paper (1/92)
SC13
[363] Alexandru Calotoiu, Torsten Hoefler, Marius Poke, Felix Wolf:
Using Automated Performance Modeling to Find Scalability Bugs in Complex Codes In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC13), presented in Denver, Colorado, USA, pages 45:1--45:12, ACM, ISBN: 978-1-4503-2378-9, Nov. 2013, (acceptance rate: 20%, 92/457)
MPI Datatype Processing using Runtime Compilation In Proceedings of the 20th European MPI Users' Group Meeting, presented in Madrid, Spain, pages 19--24, ACM, ISBN: 978-1-4503-1903-4, Sep. 2013, Best Paper Award at EuroMPI'13 (1/25)
NUMA-Aware Shared Memory Collective Communication for MPI In Proceedings of the 22nd international symposium on High-performance parallel and distributed computing, presented in New York City, NY, USA, pages 85--96, ACM, ISBN: 978-1-4503-1910-2, Jun. 2013, (acceptance rate: 15%, 20/131) Nominated for Best Paper Award at HPDC'13 (3/20)
[370] B. Prisacari, G. Rodriguez, C. Minkenberg and Torsten Hoefler:
Bandwidth-optimal All-to-all Exchanges in Fat Tree Networks In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing, presented in Eugene, OR, USA, pages 139--148, ACM, ISBN: 978-1-4503-2130-3, Jun. 2013, (acceptance rate: 21%, 41/198)
Optimization Principles for Collective Neighborhood Communications In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, presented in Salt Lake City, Utah, USA, pages 98:1--98:10, IEEE Computer Society Press, ISBN: 978-1-4673-0804-5, Nov. 2012, (acceptance rate: 21%, 100/472)
Exact Dependence Analysis for Increased Communication Overlap In Recent Advances in the Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, presented in Vienna, Austria, Springer, ISBN: 978-3-642-33517-4, Sep. 2012,
[380] Timo Schneider, Robert Gerstenberger, Torsten Hoefler:
Micro-Applications for Communication Data Access Patterns and MPI Datatypes Vol 7490, In Recent Advances in the Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, presented in Vienna, Austria, pages 121-131, Springer, ISBN: 978-3-642-33517-4, Sep. 2012, Invited to a journal special issue on top picks from EuroMPI'12.
On the Effects of CPU Caches on MPI Point-to-Point Communications In Proceedings of the 2012 IEEE International Conference on Cluster Computing, presented in Beijing, China, pages 495--503, IEEE Computer Society, ISBN: 978-0-7695-4807-4, Sep. 2012, (acceptance rate: 28.9%, 58/200)
Runtime Detection and Optimization of Collective Communication Patterns In Proceedings of the 21st international conference on Parallel Architectures and Compilation Techniques (PACT), presented in Minneapolis, MN, USA, pages 263--272, ACM, ISBN: 978-1-4503-1182-3, Sep. 2012, (acceptance rate: 18.9%, 39/207)
EuroMPI'12
[383] Torsten Hoefler, J. Dinan, D. Buntinas, P. Balaji, Brian Barrett, Ron Brightwell, William Gropp, V. Kale, Rajeev Thakur:
Leveraging MPI's One-Sided Communication Interface for Shared-Memory Programming Vol 7490, In Recent Advances in the Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, presented in Vienna, Austria, Springer, ISBN: 978-3-642-33517-4, Sep. 2012, Invited to journal special issue on top picks from EuroMPI'12.
Automatic Datatype Generation and Optimization In Proceedings of the 17th ACM symposium on Principles and practice of parallel programming, Feb. 2012, (poster paper) (acceptance rate (posters): 17%, 32/185)
Assessing HPC Failure Detectors for MPI Jobs In Proceedings of the 2012 20th Euromicro International Conference on Parallel, Distributed and Network-based Processing, presented in Munich, Germany, pages 81--88, IEEE Computer Society, ISBN: 978-0-7695-4633-9, Feb. 2012,
Energy-aware Software Development for Massive-Scale Systems(Presentation) presented in Hamburg, Germany, Sep. 2011, Keynote at the International Conference on Energy-Aware High Performance Computing (EnA-HPC'11) EnA-HPC'11 Keynote Presentation
Writing Parallel Libraries with MPI - Common Practice, Issues, and Extensions Vol 6960, In Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings, presented in Santorini, Greece, pages 345--355, Springer, ISBN: 978-3-642-24448-3, Sep. 2011, Keynote paper at IMUDI/EuroMPI 2011.
[406] Jeremiah Willcock, Torsten Hoefler, Nicholas Edmonds and Andrew Lumsdaine:
Active Pebbles: Parallel Programming for Data-Driven Applications In Proceedings of the 2011 ACM International Conference on Supercomputing (ICS'11), presented in Tucson, AZ, pages 235--245, ACM, ISBN: 978-1-4503-0102-2, Jun. 2011, (acceptance rate 21.7%, 35/161)
Deadlock-Free Oblivious Routing for Arbitrary Topologies In Proceedings of the 25th IEEE International Parallel \& Distributed Processing Symposium (IPDPS), presented in Anchorage, AL, USA, pages 613--624, IEEE Computer Society, ISBN: 0-7695-4385-7, May 2011, (acceptance rate: 19.6%, 112/571)
Kanor -- A Declarative Language for Explicit Communication In Proceedings of the 13th international conference on Practical aspects of declarative languages, presented in Austin, TX, USA, pages 190--204, Springer-Verlag, ISBN: 978-3-642-18377-5, Jan. 2011,
HiPC'10
[415] Nicholas Edmonds, Torsten Hoefler and Andrew Lumsdaine:
[424] Jeremiah Willcock, Torsten Hoefler, Nicholas Edmonds and Andrew Lumsdaine:
AM++: A Generalized Active Message Framework In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, presented in Vienna, Austria, pages 401--410, ACM, ISBN: 978-1-4503-0178-7, Sep. 2010, (acceptance rate: 17%, 46/266)
LogGOPSim - Simulating Large-Scale Applications in the LogGOPS Model In Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing, presented in Chicago, Illinois, pages 597--604, ACM, ISBN: 978-1-60558-942-8, Jun. 2010, LSAP'10 Best Paper Award
[434] Torsten Hoefler, Christian Siebert and Andrew Lumsdaine:
Scalable Communication Protocols for Dynamic Sparse Data Exchange In Proceedings of the 2010 ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP'10), presented in Bangalore, India, pages 159--168, ACM, ISBN: 978-1-60558-708-0, Jan. 2010, (acceptance rate 16.8%, 29/173)
HiPC'09
[435] P. Kambadur, A. Gupta, Torsten Hoefler and Andrew Lumsdaine:
Selected MPI-2.2 and MPI-3 Features(Presentation) presented in Portland, OR, USA, Nov. 2009, MPICH Birds of a Feather Supercomputing 2009 (SC09), host: Darius Buntinas
Improving Parallel Computing Platforms(Presentation) presented in Munich, Germany, Oct. 2009, Presentation at the Technical University of Munich, Host: Prof. M. Gerndt
Towards Efficient MapReduce Using MPI In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users' Group Meeting, presented in Helsinki, Finland, Springer, Sep. 2009,
PPL
[442] Torsten Hoefler and Timo Schneider and Andrew Lumsdaine:
Sparse Collective Operations for MPI In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, HIPS'09 Workshop, presented in Rome, Italy, ISSN: 1530-2075, ISBN: 978-1-4244-3750-4, May 2009,
The Impact of Network Noise at Large-Scale Communication Performance In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, LSPP'09 Workshop, presented in Rome, Italy, ISSN: 1530-2075, ISBN: 978-1-4244-3750-4, May 2009, Invited to a journal special issue on top picks from LSPP'09.
[449] J. Mueller, Timo Schneider, Jens Domke, R. Geyer, M. Haesing, Torsten Hoefler, S. Hoehlig, G. Juckeland, Andrew Lumsdaine, M. Mueller and W. Nagel:
Communication Optimization for Medical Image Reconstruction Algorithms Vol LNCS 5205, In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, presented in Dublin, Ireland, pages 75-83, Springer, ISSN: 0302-9743, ISBN: 078-3-540-87474-4, Sep. 2008,
Adaptive Routing Strategies for Modern High Performance Networks In 16th Annual IEEE Symposium on High Performance Interconnects, HOTI'08, presented in Stanford, CA, USA, pages 165-172, IEEE Computer Society, ISBN: 978-0-7695-3380-3, Aug. 2008, (acceptance rate 30%, 14/47)
MPI Forum
[464] Torsten Hoefler, Jesper Larsson Träff, Christian Siebert and Andrew Lumsdaine:
Accurately Measuring Collective Operations at Massive Scale In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium, PMEO'08 Workshop, presented in Miami, FL, ISSN: 1530-2075, ISBN: 978-1-4244-1694-3, Apr. 2008, Invited to a journal special issue on top picks from PMEO'08.
Optimizing non-blocking Collective Operations for InfiniBand In Proceedings of the 22nd IEEE International Parallel & Distributed Processing Symposium, CAC'08 Workshop, presented in Miami, FL, ISSN: 1530-2075, ISBN: 978-1-4244-1694-3, Apr. 2008,
MPI-3 Collective Working Group - April'08 Meeting(Presentation) Indiana University. presented in Chicago, IL, USA, Apr. 2008, Slides with proposals to the MPI-3 collective WG, all preliminary, published on request
MPI-3 Collective Working Group - January'08 Meeting(Presentation) Indiana University. presented in Chicago, IL, USA, Mar. 2008, Slides with proposals to the MPI-3 collective WG, all preliminary, published on request
[475] Timo Schneider, Torsten Hoefler, Simon Wunderlich, Torsten Mehlan and Wolfgang Rehm:
An optimized ZGEMM implementation for the Cell BE In Proceedings of the 9th Workshop on Parallel Systems and Algorithms (PASA), presented in Dresden, Germany, ISSN: 1617-5468, ISBN: 978-3-88579-218-5, Feb. 2008,
[482] Torsten Hoefler, P. Kambadur, R. L. Graham, G. Shipman and Andrew Lumsdaine:
A Case for Standard Non-Blocking Collective Operations Vol 4757, In Recent Advances in Parallel Virtual Machine and Message Passing Interface, EuroPVM/MPI 2007, presented in Paris, France, pages 125-134, Springer, ISSN: 0302-9743, ISBN: 978-3-540-75415-2, Oct. 2007,
CASCON'07
[483] Timo Schneider, Simon Wunderlich, Wolfgang Rehm, Torsten Hoefler and H. Schick:
Non-blocking Collectives for MPI-2(Presentation) Dresden University of Technology, Center for Information Services and High Performance Computing (ZIH). presented in Dresden, Germany, Oct. 2007,
HPCC'07
[485] Torsten Hoefler, Torsten Mehlan, Andrew Lumsdaine and Wolfgang Rehm:
Netgauge: A Network Performance Measurement Framework Vol 4782, In Proceedings of High Performance Computing and Communications, HPCC'07, presented in Houston, USA, pages 659-671, Springer, ISBN: 978-3-540-75443-5, Sep. 2007,
[492] Frank Mietke, D. Dunger, Torsten Mehlan, Torsten Hoefler and Wolfgang Rehm:
A native InfiniBand Transporter for MySQL Cluster TU Chemnitz. In Proceedings of the 2nd Workshop 'Kommunikation in Clusterrechnern und Clusterverbundsystemen' (KiCC'07), presented in Chemnitz, Germany, Feb. 2007,
Non-Blocking Collectives for MPI-2(Presentation) Commissariat a l'Energie Atomique - Direction des applications militaires (CEA-DAM). presented in Bruyeres-le-chatel, France, Jan. 2007,
A Case for Non-Blocking Collective Operations Vol 4331/2006, In Frontiers of High Performance Computing and Networking - ISPA'06 Workshops, presented in Sorrento, Italy, pages 155-164, Springer Berlin / Heidelberg, ISBN: 978-3-540-49860-5, Dec. 2006,
IRS - A portable Interface for Reconfigurable Systems In Proceedings of IEEE International Conference on Parallel Computing in Electrical Engineering (PARELEC'06), presented in Bialystok, Poland, pages 187-191, IEEE Computer Society, ISBN: 0-7695-2554-7, Sep. 2006,
Optimizing a Conjugate Gradient Solver with Non-Blocking Collective Operations In Recent Advantages in Parallel Virtual
Machine and Message Passing Interface. 13th European PVM/MPI User's
Group Meeting, Proceedings, LNCS 4192, presented in Bonn, Germany, pages 374-382, Springer, ISSN: 0302-9743, ISBN: 3-540-39110-X, Sep. 2006, Invited to a journal special issue on top picks from EuroMPI'06.
PARELEC'06
[501] Torsten Hoefler, C. Viertel, Torsten Mehlan, Frank Mietke, Wolfgang Rehm:
[508] Torsten Hoefler, Torsten Mehlan, Frank Mietke and Wolfgang Rehm:
Fast Barrier Synchronization for InfiniBand In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS), CAC'06 Workshop, presented in Rhodes, Greece, ISBN: 1-4244-0054-6, Apr. 2006,
Open MPI - Collv2 Design(Presentation) Cisco Systems. presented in San Jose, CA, USA, Apr. 2006,
PMEO'06
[510] Torsten Hoefler, Torsten Mehlan, Frank Mietke and Wolfgang Rehm:
LogfP - A Model for small Messages in InfiniBand In Proceedings of the 20th IEEE International Parallel & Distributed Processing Symposium (IPDPS), PMEO-PDS'06 Workshop, presented in Rhodes, Greece, ISBN: 1-4244-0054-6, Apr. 2006,
ARCS'06
[511] Torsten Hoefler, Torsten Mehlan, Frank Mietke and Wolfgang Rehm:
A Performance Analysis of ABINIT on a Cluster System TU Chemnitz. In Parallel Algorithms and Cluster Computing, presented in Chemnitz, Germany, pages 37-51, Springer, Lecture Notes in Computational Science and Engineering, ISBN: 3-540-33539-0, Dec. 2005,
Improving the parallel scaling of ABINIT CINECA Consorzio Interuniversitario. In Science and Supercomputing in Europe - Report 2005, presented in Caseleccio di Reno, Italy, pages 551-559, CINECA Conzorzio Interuniversitario, ISBN: 88-86037-17-1, Dec. 2005,