Life would be so much easier if only we had the source code...
Home -> Research -> MPI Datatypes -> DDTBench
Home
  Publications
  Awards
  Research
    
NB Collectives
    MPI Topologies
    MPI Datatypes
      
DDTBench
    Netgauge
    Network Topologies
    Ethernet BTL eth
    ORCS
    DFSSSP
    Older Projects
    cDAG
    LogGOPSim
    CoMPIler
  Teaching
  Miscellaneous
  Full CV [pdf]
  BLOG






  Events








  Past Events





DDTBench: Micro-Applications for Communication Data Access Patterns and MPI Datatypes

DDTBench is a suite of Micro-Apps that captures how parallel scientific applications from many different fields of science access the data that they send and receive between processes. MPI Derived Datattypes (DDTs) allow to specify those access patterns in such a way that no explicit copy operation is needed, in contrast to the pack-unpack loops found in many codes. In DDTBench we compare the packing overhead incurred by such loops to that of MPI DDTs. This is done by performing a ping-pong benchmark, once using MPI DDTs to specify how data should be packed and once using the pack-unpack loops that we found in the applications. The measurement loop of the benchmark is shown below:

benchmark_expl.png
Measurement loop of DDTBench. Measurements are taken on process 0, no global clock is not required.

Using the times it takes to perform each operation (colored block in the picture above) we can calculate the overhead for packing/unpacking data with both methods. Of course we can not measure this overhead directly in the case MPI DDTs are used, because data re-packing is implicit. But we can calculate the time used for transferring packed data, t_net, by subtracting the time required for manual packing and unpacking from the round-trip time of the ping-pong with manual packing. Now the data-repacking overhead for both cases can be calculated by subtracting t_net from the ping-pong round trip time and dividing the result by the ping pong round trip time. We did this for some of the micro-apps in the graph shown below:

packing_costs.png
Packing costs for different test cases

It can be seen that MPI DDTs can reduce the overhead associated with data-packing (i.e., from 40% to 15% in the case of NAS_LU_x, where a contiguous array is needlessly copied by the original code). The large difference between the performance delivered by Open MPIs DDT engine compared to that of MVAPICH shows that there is still some work to be done in improving MPI DDT implementations. We hope that DDTBench can server implementers as a guideline on which access patterns deserve special attention. A list of the micro-apps included in DDTBench can be found in the table below.

Application Class Testname Access Pattern
Atmospheric Science WRF_x_vec struct of 2D/3D/4D face exchanges in different directions (x,y), using different (semantically equivalent) datatypes: nested vectors (_vec) and subarrays (_sa)
WRF_y_vec
WRF_x_sa
WRF_y_sa
Quantum Chromodynamics MILC_su3_zd 4D face exchange, z direction, nested vectors
Fluid Dynamics NAS_MG_x 3D face exchange in each direction (x,y,z) with vectors (y,z) and nested vectors (x)
NAS_MG_y
NAS_MG_z
NAS_LU_x 2D face exchange in x direction (contiguous) and y direction (vector)
NAS_LU_y
Matrix Transpose FFT 2D FFT, different vector types on send/recv side
SPECFEM3D_mt 3D matrix transpose
Molecular Dynamics LAMMPS_full unstructured exchange of different particle types (full/atomic), indexed datatypes
LAMMPS_atomic
Geophysical Science SPECFEM3D_oc unstructured exchange of acceleration data for different earth layers, indexed datatypes
SPECFEM3D_cm

DDTBench can be downloaded as ddtbench-1.2.1.tar.gz - (366.19 kb). It can be compiled with "make", if the resulting binary is executed it will write an output file "timings_test". The result file has 5 columns. The first column is the testname, as given in the table above. The second column specifies the type of benchmark (i.e., manual packing, DDT send/recv, packing with MPI DDTs, reference ping pong without packing), the third column specifies how many bytes are transferred for that particular test configuration (i.e., the MPI_Type_size() of the used datatype). The fourth column identifies the step in the benchmark (cf. with the figure on top) to which the time in the fifth column corresponds. Note that no statistical aggregation is done by the benchmark itself, the full information about each measured value is given to the user. The DDTBench tarball contains an R script which can produce pictures like the one shown above.

References

EuroMPI'12
[1] Timo Schneider, Robert Gerstenberger, Torsten Hoefler:
 Micro-Applications for Communication Data Access Patterns and MPI Datatypes Vol 7490, In Recent Advances in the Message Passing Interface - 19th European MPI Users' Group Meeting, EuroMPI 2012, Vienna, Austria, September 23-26, 2012. Proceedings, presented in Vienna, Austria, pages 121-131, Springer, ISBN: 978-3-642-33517-4, Sep. 2012, Invited to a journal special issue on top picks from EuroMPI'12.




serving: 18.119.120.59:45045© Torsten Hoefler