Questions

  • How does non-determinism arise in MPI applications?
  • What are the impacts of non-determinism in scientific executions?

Objectives

  • Learn about different communication methods in MPI.

  • Learn how non-determinism impacts on scientific applications with real examples.

Tracking Scientific Incorrectness with ANACIN-X

Scenario

Let us consider a scientist, in conducting a meticulous experiment, initiaties two separate runs of the identical codebase. And, the outcome from these runs diverge significantly, raising questions about the underlying factors influencing the code's behavior.

../images/results_interpretation_images/scientist1.png

Different outputs at separate runs of same code

Tracking Scientific Incorectness

The scientist uses ANACIN-X to pinpoint discrepancies between runs by generating Event Graphs for each run and plotting the KDTS (Kernel Distance Time Series).

../images/results_interpretation_images/Scintist2.png

Different outputs at separate runs of same code

Identifying Relevant Source Code Location

ANACIN-X automatically identify abrupt spike in kernel distance. These spikes imply sudden changes in the underlying communication non-determinism across executions. This information helps to extract relevant source-code location via callstack vertex labels.

Now, the scientist knows where to look for the root cause of discrepant results.

../images/results_interpretation_images/scientist3.png