Home -> Research -> MPI Datatypes -> DDTBench | |||||||||||||||||||||||||||||||||||||||
Home Events Past Events |
DDTBench: Micro-Applications for Communication Data Access Patterns and MPI Datatypes DDTBench is a suite of Micro-Apps that captures how parallel scientific applications from many different fields of science access the data that they send and receive between processes. MPI Derived Datattypes (DDTs) allow to specify those access patterns in such a way that no explicit copy operation is needed, in contrast to the pack-unpack loops found in many codes. In DDTBench we compare the packing overhead incurred by such loops to that of MPI DDTs. This is done by performing a ping-pong benchmark, once using MPI DDTs to specify how data should be packed and once using the pack-unpack loops that we found in the applications. The measurement loop of the benchmark is shown below:
Measurement loop of DDTBench. Measurements are taken on process 0, no global clock is not required. Using the times it takes to perform each operation (colored block in the picture above) we can calculate the overhead for packing/unpacking data with both methods. Of course we can not measure this overhead directly in the case MPI DDTs are used, because data re-packing is implicit. But we can calculate the time used for transferring packed data, t_net, by subtracting the time required for manual packing and unpacking from the round-trip time of the ping-pong with manual packing. Now the data-repacking overhead for both cases can be calculated by subtracting t_net from the ping-pong round trip time and dividing the result by the ping pong round trip time. We did this for some of the micro-apps in the graph shown below:
Packing costs for different test cases It can be seen that MPI DDTs can reduce the overhead associated with data-packing (i.e., from 40% to 15% in the case of NAS_LU_x, where a contiguous array is needlessly copied by the original code). The large difference between the performance delivered by Open MPIs DDT engine compared to that of MVAPICH shows that there is still some work to be done in improving MPI DDT implementations. We hope that DDTBench can server implementers as a guideline on which access patterns deserve special attention. A list of the micro-apps included in DDTBench can be found in the table below.
DDTBench can be downloaded as ddtbench-1.2.1.tar.gz - (366.19 kb). It can be compiled with "make", if the resulting binary is executed it will write an output file "timings_test". The result file has 5 columns. The first column is the testname, as given in the table above. The second column specifies the type of benchmark (i.e., manual packing, DDT send/recv, packing with MPI DDTs, reference ping pong without packing), the third column specifies how many bytes are transferred for that particular test configuration (i.e., the MPI_Type_size() of the used datatype). The fourth column identifies the step in the benchmark (cf. with the figure on top) to which the time in the fifth column corresponds. Note that no statistical aggregation is done by the benchmark itself, the full information about each measured value is given to the user. The DDTBench tarball contains an R script which can produce pictures like the one shown above. References
|
serving: 18.119.120.59:45045 | © Torsten Hoefler |