ANACIN-X: Analysis and Modeling of Nondeterminism and Associated Costs in eXtreme Scale Applications

Funded by the National Science Foundation (NSF) under grants number 1900888 and 1900765


Nondeterminism is a growing and often forgotten challenge for HPC execution. In order to approach this problem, this project has four main goals:
Model nondeterministic executions by identifying sources of nondeterminism in HPC applications, and detailing the characteristics of these sources in a given communication topology.
Improve understanding of nondeterministic features across various HPC applications via record and replay techniques mixed with our graph modeling.
Provide lightweight, portable methods for identifying application nondeterminism, allowing for efficient analysis and debugging of arbitrary HPC codes.
Foster an understanding of nondeterminism in the next generation of HPC experts via collaboration with the SIGHPC Resource Constrained Environments Chapter to reach out to edge communities, and to engage students in research opportunities.

Selected Publications

D. Chapp, D. Rorabaugh (#), K. Sato, D. Ahn, and M. Taufer. A Three-phase Workflow for General and Expressive Representations of Nondeterminism in HPC Applications. International Journal of High-Performance Computing Applications(IJHPCA), 1175-1184 (2019). [link]
D. Chapp, K. Sato, D. Ahn, and M. Taufer. Record-and-Replay Techniques for HPC Systems: A survey. Journal of Supercomputing Frontiers and Innovations, 5(1):11-30,. (2018). [link]
D. Chapp, T. Johnston, and M. Taufer. On the Need for Reproducible Numerical Accuracy through Intelligent Runtime Selection of Reduction Algorithms at the Extreme Scale. In Proceedings of IEEE Cluster Conference, pp. 166 – 175. Chicago, Illinois, USA. September 8 – 11, 2015. [link]

Lohith Kumar

Graduate Student, University of North Texas

Dylan Chapp