ANACIN-X: Analysis and Modeling of Nondeterminism and Associated Costs in eXtreme Scale
Applications
Funded by the National Science Foundation (NSF) under grants number 1900888
and 1900765
About ANACIN-X
Nondeterminism is a growing
and often forgotten challenge for HPC execution. In order to approach this problem, this
project has four main goals:
Model non-deterministic executions by
identifying sources of non-determinism in HPC applications, and detailing the
characteristics of these sources in a given communication topology.
Improve understanding of non-deterministic features
across various HPC applications via tracing techniques mixed with
modeling and comparison of event graphs.
Study the types and sources of communication
non-determinism via both application of existing tools for graph
comparison and development of new tools for graph alignment.
Provide lightweight, portable methods for modeling
and identifying application non-determinism, allowing for efficient
analysis and debugging of arbitrary HPC codes.
Foster an understanding of non-determinism in the
next generation of HPC experts via development and release of
interactive and educational software and media about non-determinism,
collaboration with computer science and HPC organizations such as the SIGHPC
Resource Constrained Environments Chapter to reach out to edge communities, and
engagement of students in research opportunities.
Selected Publications
Bell P, Suarez K, Fossum B, Chapp D, Bhowmick S, Taufer M.
A Research-Based Course Module to Study Non-determinism in High Performance
Applications.IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
(2022). doi: 10.1109/IPDPSW55747.2022.00067.
[link forthcoming]
D. Chapp, N. Tan, S. Bhowmick and M. Taufer.
Identifying Degree and Sources of Non-Determinism in MPI Applications Via Graph
Kernels,
in IEEE Transactions on Parallel and Distributed Systems, vol. 32, no. 12, pp.
2936-2952,
1 Dec. 2021, doi:10.1109/TPDS.2021.3081530.
[link]
Bell P, Suarez K, Chapp D, Tan N, Bhowmick S, Taufer M.
ANACIN-X: A Software Framework for Studying Non-determinism in MPI Applications.Software impacts.
9 Oct. 2021, doi: 10.1016/j.simpa.2021.100151.
[link]
D. Chapp, D. Rorabaugh (#), K. Sato, D. Ahn, and M. Taufer.
A Three-phase Workflow for General and Expressive Representations of Nondeterminism in
HPC Applications.International Journal of High-Performance Computing Applications(IJHPCA), 1175-1184
(2019).
[link]
D. Chapp, K. Sato, D. Ahn, and M. Taufer.
Record-and-Replay Techniques for HPC Systems: A survey.Journal of Supercomputing Frontiers and Innovations, 5(1):11-30,.
(2018).
[link]
D. Chapp, T. Johnston, and M. Taufer.
On the Need for Reproducible Numerical Accuracy through Intelligent Runtime Selection of
Reduction Algorithms at the Extreme Scale.In Proceedings of IEEE Cluster Conference, pp. 166 – 175. Chicago, Illinois, USA.
September 8 – 11, 2015.
[link]
Meet
the
Team
Dr. Michela Taufer
Professor, University of Tennessee Knoxville
Dr. Sanjukta Bhowmick
Associate Professor, University of North Texas
Dr. Jack Marquez
Postdoctoral Researcher, University of Tennessee Knoxville
Barbara Fossum
Outreach Coordinator, University of Tennessee Knoxville
Nigel Tan
Graduate Student, University of Tennessee Knoxville
Ali Khan
Graduate Student, University of North Texas
Befikir Bogale
Undergraduate Student, University of Tennessee Knoxville
Former Members
Nick Bell
Research Scientist, University of Tennessee Knoxville
Cole Johnston
Undergraduate Student, University of Tennessee Knoxville
Dr. Dylan Chapp
Collaborator
Krishna Sai Ujwal
Graduate Student, University of North Texas
Kae Suarez
Graduate Student, University of Tennessee Knoxville