Questions

How does non-determinism arise in MPI applications?
What are the impacts of non-determinism in scientific executions?

Objectives

Learn about different communication methods in MPI.
Learn how non-determinism impacts on scientific applications with real examples.

Benchmarks

ANACIN-X provides users with four application benchmarks for testing its ability to characterize non-determinism. Each benchmark application employs a unique combination of a non-determinism source and non-determinism type. These Benchmarks patterns are common in scientific applications and thus are used in ANACIN-X as testing benchmarks.

Communication patterns available in the ANACIN-X application benchmarks. A "✔" indicates if a communication pattern expressing the non-determinism of a given type is present, and "✘" shows that the type of non-determinism is absent. In the table, "Recv" stands for receiver-side non-determinism; "Send" stands for the sender-side non-determinism; and "Topo" stands for process topology non-determiniwm; and RNG stands for random number generation used to randomize process neighborhoods.
Communication Pattern	Recv	Send	Topology	Source
Message Race	✔	✘	✘	MPI_ANY_SOURCE
AMG2013	✔	✔	✘	MPI_ANY_SOURCE
MCB Grid	✔	✔	✘	MPI_Testsome
Unstructured Mesh	✔	✔	✔	MPI_Waitany + RNG

For a given benchark application, a user may generate an input set of trace data by defining the following parameters:

MPI Processes: number of MPI processes (integer) used for the benchmark execution

Compute Nodes:number of compute nodes (integer) used for the benchmark execution

Message Size (bytes) size of messages being passed (integer value)

Pattern Iterations: number of times a communication pattern iterates within an application execution (integer value)

Message Non-determinism Percentage peak amount of message non- determinism enabled in an application by its non-determinism source (percentage); due to messages that can be received by or sent by a process in a variable order

Topological Non-determinism Percentage: amount of topological non-determinism (percentage); due to applications where the set of processes a given process communicates with can vary from run to run — this is unique to the Unstructured Mesh benchmark

Application Executions:number of times to run the application (integer value of 2 or larger).

Message Race 

Message Race refers to a situation where multiple processes attempt to send messages to a common recipient, and the order in which these messages are received is not guranteed. This lack of ordering can create unpredictability in the behaviour of the system. It is simple but ubiquitious cause of non determinsim in MPI applications.

The pseudocode below demonstrate the non-determinism percentage controlled by parameters N_DET and N_ND parameters

  if (rank == 0) {
       for( i = 0; i < num_DET; ++i) {
        MPI_Recv(... , i , ... );
      }

       for ( i = 0; i < num_ND; ++i){
            MPI_Recv(... , MPI_ANY_SOURCE , ... );
      }
  }else{
    MPI_Send(... , 0 , ...);
  }

  

../images/E01-pack_unpack.svg — Event Graph of Message Race

Figure above shows the non determinism due to message race. Here, the process 0 receives messages from 4 fixed sources followed by 4 wildcard receives. This is a controlled message race by 50 % ND configuration.

AMG2013 

Algebric Multigrid 2013¹ is selected as our benchmark as it can demonstrate both receiver-side and sender side non-determinism, as highlighted in ². It achieves non-determinism through the use of non-blocking wildcard probes.
When a probe matches with a message, a receive is posted, and upon completion of the recieve, a send is issued to the rprocess whose message was matched. The source parameter's value in the receive call is non-deterministic due to the wildcard probes' non-determinism, making the propagation of non-determinism from probe to receive and send not immediately apparent from a simple inspection of the code.

The pseudocode below shwos the communication pattern in AMG2013.

     for( i = 0; i < num_neighbors; ++i) {
        MPI_Irecv(... , tag_2  , ... );
      }

     for( i = 0; i < num_neighbors; ++i) {
        MPI_Irecv(... , tag_1 , ... );
      }
    ...
     while() {
        MPI_Iprobe( MPI_ANY_SOURCE, tag_1, ... flag, status );
        if (flag) {
           neighbor = &status.MPI_Source; 
           // Non-determinism from probe
           // propagates to receive ...
           MPI_Recv(x, ..., neighbor , tag_1, ... );
           // ... and then propagates
           // to send
           MPI_Send(x, ..., neighbor , tag_2, ... );
        }
     }
    

Figure above shows the event graph generated by from non-determinastic order of message passing in AMG 2013.

MCB Grid 

MCB ³ is a proxy application for Monte Carlo simulations based on three main types of particle exchange patterns: a particle exchange on a 2-dimensional grid, a non-blocking gather of particles, and a non-blocking scatter of particles. The latter two serve as bookkeeping for the particle exchange.

MCB primarily uses MPI’s non-blocking point-to-point communication primitives, heavily relying on functions like MPI_Testsome during particle exchange.

Communication buffers are shared between neighbor processes. Varying buffer size affects communication rates, leading to changes in non-determinism levels. Users often lack awareness of the buffer size used during compilation, making it challenging to link observed numerical non-determinism to buffer size.

ANACIN-X quantifies how changes in buffer size influence non-determinism in MCB communication patterns. It dissects executions, establishing the relationship between application configuration parameters, especially buffer size, and non-determinism in communication patterns. This non-determinism can cause numerical output differences between runs.

Unstructured Mesh Patterns 

The communication pattern, derived from the Chatterbug Communication Pattern Suite ⁴, displays non-determinism due to a randomized process topology. This randomness leads to variations in which processes communicate with each other from run to run, setting it apart from the other three patterns where only the message order differs between runs. Additionally, there are three distinct forms of non-determinism observed in the described communication patterns:

Receiver-side non-determinism (Recv) Messages can be received by a process in variable orders.
Sender-side non-determinism (Send) Processes can send messages in variable orders.
Process topology non-determinism (Topo) Applications may have varying sets of processes that a given process communicates with in different runs.

The first two types are termed message non-determinism, while the third type is referred to as topology non-determinism. The MPI features contributing to non-determinism in these patterns are MPI_ANY_SOURCE (in Message Race and AMG2013), MPI_Testsome (in MCB Grid), and MPI_Waitany (in Unstructured Mesh). The specifics of these communication patterns are summarized in Table.

References

High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems
On Communication Determinism in Parallel HPC Applications.
N. Gentile and B. Miller, “Monte carlo benchmark (MCB),” Lawrence Livermore Nat. Lab., Livermore, CA, USA, Tech. Rep. TR- LLNL-CODE-507091, 2010.
N. Jain and A. Bhatele, “Chatterbug communication proxy appli- cations suite,” Lawrence Livermore Nat. Lab., Livermore, CA, USA, Tech. Rep. LLNL-CODE-756471, 2018

Benchmarks

Communication Pattern

Recv

Send

Topology

Source

Message Race

✔

✘

✘

MPI_ANY_SOURCE

AMG2013

✔

✔

✘

MPI_ANY_SOURCE

MCB Grid

✔

✔

✘

MPI_Testsome

Unstructured Mesh

✔

✔

✔

MPI_Waitany + RNG