ANACIN-X: Analysis and Modeling of Nondeterminism and Associated Costs in eXtreme Scale Applications

Funded by the National Science Foundation (NSF) under grants number 1900888 and 1900765

About ANACIN-X

Nondeterminism is a growing and often forgotten challenge for HPC execution. In order to approach this problem, this project has four main goals:
Model non-deterministic executions by identifying sources of non-determinism in HPC applications, and detailing the characteristics of these sources in a given communication topology.
Improve understanding of non-deterministic features across various HPC applications via tracing techniques mixed with modeling and comparison of event graphs.
Study the types and sources of communication non-determinism via both application of existing tools for graph comparison and development of new tools for graph alignment.
Provide lightweight, portable methods for modeling and identifying application non-determinism, allowing for efficient analysis and debugging of arbitrary HPC codes.
Foster an understanding of non-determinism in the next generation of HPC experts via development and release of interactive and educational software and media about non-determinism, collaboration with computer science and HPC organizations such as the SIGHPC Resource Constrained Environments Chapter to reach out to edge communities, and engagement of students in research opportunities.

Selected Publications

Bell P, Suarez K, Fossum B, Chapp D, Bhowmick S, Taufer M. A Research-Based Course Module to Study Non-determinism in High Performance Applications. IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). (2022). doi: 10.1109/IPDPSW55747.2022.00067. [link forthcoming]
D. Chapp, N. Tan, S. Bhowmick and M. Taufer. Identifying Degree and Sources of Non-Determinism in MPI Applications Via Graph Kernels, in IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 12, pp. 2936-2952, 1 Dec. 2021, doi:10.1109/TPDS.2021.3081530. [link]
Bell P, Suarez K, Chapp D, Tan N, Bhowmick S, Taufer M. ANACIN-X: A Software Framework for Studying Non-determinism in MPI Applications. Software impacts. 9 Oct. 2021, doi: 10.1016/j.simpa.2021.100151. [link]
D. Chapp, D. Rorabaugh (#), K. Sato, D. Ahn, and M. Taufer. A Three-phase Workflow for General and Expressive Representations of Nondeterminism in HPC Applications. International Journal of High-Performance Computing Applications(IJHPCA), 1175-1184 (2019). [link]
D. Chapp, K. Sato, D. Ahn, and M. Taufer. Record-and-Replay Techniques for HPC Systems: A survey. Journal of Supercomputing Frontiers and Innovations, 5(1):11-30,. (2018). [link]
D. Chapp, T. Johnston, and M. Taufer. On the Need for Reproducible Numerical Accuracy through Intelligent Runtime Selection of Reduction Algorithms at the Extreme Scale. In Proceedings of IEEE Cluster Conference, pp. 166 – 175. Chicago, Illinois, USA. September 8 – 11, 2015. [link]

Ali Khan

Graduate Student, University of North Texas

Cole Johnston

Undergraduate Student, University of Tennessee Knoxville

Dr. Dylan Chapp

Collaborator