Questions

  • How does non-determinism arise in MPI applications?
  • What are the impacts of non-determinism in scientific executions?

Objectives

  • Learn about different communication methods in MPI.

  • Learn how non-determinism impacts on scientific applications with real examples.

Benchmarks

ANACIN-X provides users with four application benchmarks for testing its ability to characterize non-determinism. Each benchmark application employs a unique combination of a non-determinism source and non-determinism type. These Benchmarks patterns are common in scientific applications and thus are used in ANACIN-X as testing benchmarks.

Communication patterns available in the ANACIN-X application benchmarks. A "✔" indicates if a communication pattern expressing the non-determinism of a given type is present, and "✘" shows that the type of non-determinism is absent. In the table, "Recv" stands for receiver-side non-determinism; "Send" stands for the sender-side non-determinism; and "Topo" stands for process topology non-determiniwm; and RNG stands for random number generation used to randomize process neighborhoods.

Communication Pattern

Recv

Send

Topology

Source

Message Race

MPI_ANY_SOURCE

AMG2013

MPI_ANY_SOURCE

MCB Grid

MPI_Testsome

Unstructured Mesh

MPI_Waitany + RNG

For a given benchark application, a user may generate an input set of trace data by defining the following parameters:

  • MPI Processes: number of MPI processes (integer) used for the benchmark execution
  • Compute Nodes:number of compute nodes (integer) used for the benchmark execution
  • Message Size (bytes) size of messages being passed (integer value)
  • Pattern Iterations: number of times a communication pattern iterates within an application execution (integer value)
  • Message Non-determinism Percentage peak amount of message non- determinism enabled in an application by its non-determinism source (percentage); due to messages that can be received by or sent by a process in a variable order
  • Topological Non-determinism Percentage: amount of topological non-determinism (percentage); due to applications where the set of processes a given process communicates with can vary from run to run — this is unique to the Unstructured Mesh benchmark
  • Application Executions:number of times to run the application (integer value of 2 or larger).

Message Race

Message Race refers to a situation where multiple processes attempt to send messages to a common recipient, and the order in which these messages are received is not guranteed. This lack of ordering can create unpredictability in the behaviour of the system. It is simple but ubiquitious cause of non determinsim in MPI applications.

The pseudocode below demonstrate the non-determinism percentage controlled by parameters NDET and NND parameters

  if (rank == 0) {
       for( i = 0; i < num_DET; ++i) {
        MPI_Recv(... , i , ... );
      }

       for ( i = 0; i < num_ND; ++i){
            MPI_Recv(... , MPI_ANY_SOURCE , ... );
      }
  }else{
    MPI_Send(... , 0 , ...);
  }

  
../images/E01-pack_unpack.svg

Event Graph of Message Race

Figure above shows the non determinism due to message race. Here, the process 0 receives messages from 4 fixed sources followed by 4 wildcard receives. This is a controlled message race by 50 % ND configuration.

AMG2013

Algebric Multigrid 20131 is selected as our benchmark as it can demonstrate both receiver-side and sender side non-determinism, as highlighted in 2. It achieves non-determinism through the use of non-blocking wildcard probes.
When a probe matches with a message, a receive is posted, and upon completion of the recieve, a send is issued to the rprocess whose message was matched. The source parameter's value in the receive call is non-deterministic due to the wildcard probes' non-determinism, making the propagation of non-determinism from probe to receive and send not immediately apparent from a simple inspection of the code.

The pseudocode below shwos the communication pattern in AMG2013.


     for( i = 0; i < num_neighbors; ++i) {
        MPI_Irecv(... , tag_2  , ... );
      }

     for( i = 0; i < num_neighbors; ++i) {
        MPI_Irecv(... , tag_1 , ... );
      }
    ...
     while() {
        MPI_Iprobe( MPI_ANY_SOURCE, tag_1, ... flag, status );
        if (flag) {
           neighbor = &status.MPI_Source; 
           // Non-determinism from probe
           // propagates to receive ...
           MPI_Recv(x, ..., neighbor , tag_1, ... );
           // ... and then propagates
           // to send
           MPI_Send(x, ..., neighbor , tag_2, ... );
        }
     }
    
../images/E01-pack_unpack.svg

Event Graph of AMG2013

Figure above shows the event graph generated by from non-determinastic order of message passing in AMG 2013.

MCB Grid

MCB 3 is a proxy application for Monte Carlo simulations based on three main types of particle exchange patterns: a particle exchange on a 2-dimensional grid, a non-blocking gather of particles, and a non-blocking scatter of particles. The latter two serve as bookkeeping for the particle exchange.

../images/E01-pack_unpack.svg

Particle exchange patterns in MCB

MCB primarily uses MPI’s non-blocking point-to-point communication primitives, heavily relying on functions like MPI_Testsome during particle exchange.

Communication buffers are shared between neighbor processes. Varying buffer size affects communication rates, leading to changes in non-determinism levels. Users often lack awareness of the buffer size used during compilation, making it challenging to link observed numerical non-determinism to buffer size.

ANACIN-X quantifies how changes in buffer size influence non-determinism in MCB communication patterns. It dissects executions, establishing the relationship between application configuration parameters, especially buffer size, and non-determinism in communication patterns. This non-determinism can cause numerical output differences between runs.

Unstructured Mesh Patterns

The communication pattern, derived from the Chatterbug Communication Pattern Suite 4, displays non-determinism due to a randomized process topology. This randomness leads to variations in which processes communicate with each other from run to run, setting it apart from the other three patterns where only the message order differs between runs. Additionally, there are three distinct forms of non-determinism observed in the described communication patterns:

  • Receiver-side non-determinism (Recv) Messages can be received by a process in variable orders.
  • Sender-side non-determinism (Send) Processes can send messages in variable orders.
  • Process topology non-determinism (Topo) Applications may have varying sets of processes that a given process communicates with in different runs.
The first two types are termed message non-determinism, while the third type is referred to as topology non-determinism. The MPI features contributing to non-determinism in these patterns are MPI_ANY_SOURCE (in Message Race and AMG2013), MPI_Testsome (in MCB Grid), and MPI_Waitany (in Unstructured Mesh). The specifics of these communication patterns are summarized in Table.

References

  1. High-performance algebraic multigrid solver optimized for multi-core based distributed parallel systems
  2. On Communication Determinism in Parallel HPC Applications.
  3. N. Gentile and B. Miller, “Monte carlo benchmark (MCB),” Lawrence Livermore Nat. Lab., Livermore, CA, USA, Tech. Rep. TR- LLNL-CODE-507091, 2010.
  4. N. Jain and A. Bhatele, “Chatterbug communication proxy appli- cations suite,” Lawrence Livermore Nat. Lab., Livermore, CA, USA, Tech. Rep. LLNL-CODE-756471, 2018