Questions
- How does non-determinism arise in MPI applications?
- What are the impacts of non-determinism in scientific executions?
Objectives
Learn about different communication methods in MPI.
-
Learn how non-determinism impacts on scientific applications with real examples.
Tracking Scientific Incorrectness with ANACIN-X
Scenario
Let us consider a scientist, in conducting a meticulous experiment, initiaties two separate runs of the identical codebase. And, the outcome from these runs diverge significantly, raising questions about the underlying factors influencing the code's behavior.
data:image/s3,"s3://crabby-images/e751a/e751ac0f70d49d8c602a75f06c0b78e054e29281" alt="../images/results_interpretation_images/scientist1.png"
Different outputs at separate runs of same code
Tracking Scientific Incorectness
The scientist uses ANACIN-X to pinpoint discrepancies between runs by generating Event Graphs for each run and plotting the KDTS (Kernel Distance Time Series).
data:image/s3,"s3://crabby-images/38a4d/38a4da5f386674f2368f26392fda590c53a0e40b" alt="../images/results_interpretation_images/Scintist2.png"
Different outputs at separate runs of same code
Identifying Relevant Source Code Location
ANACIN-X automatically identify abrupt spike in kernel distance. These spikes imply sudden changes in the underlying communication non-determinism across executions.
This information helps to extract relevant source-code location via callstack vertex labels.
Now, the scientist knows where to look for the root cause of discrepant results.
data:image/s3,"s3://crabby-images/faa25/faa25ada51efdafdceebe18bc2a6f38e755b7e9a" alt="../images/results_interpretation_images/scientist3.png"