Analyzing an MPI application with EZTrace
Installing EZTrace
First, configure EZTrace as described in the
support page.
$ ./configure
--with-mpi-include=<PATH_TO_MPI.H>
--prefix=<INSTALLATION_DIR>
Check that MPI was detected by configure:
$ grep -r ac_cv_header_mpi_h config.log
ac_cv_header_mpi_h=yes
If everything is OK, compile and install EZTrace:
$ make &&make install
Check the installation of EZTrace:
$ eztrace_avail
3 stdio Module for
stdio functions (read, write, select, poll, etc.)
2 pthread Module for PThread
synchronization functions (mutex, semaphore, spinlock, etc.)
1 omp
Module for OpenMP parallel regions
4 mpi
Module for MPI functions
5 memory Module for memory
functions (malloc, free, etc.)
Running an application with EZTrace
Let's analyze
this MPI stencil application.
First, compile it:
$ tar xzvf stencil_mpi.tgz
[...]
$ mpicc -o stencil_mpi stencil_mpi.c
You can run the application with:
$ mpirun -np 4 ./stencil_mpi 1024
To analyze it with EZTrace, simply add
eztrace -t mpi
to the
command line:
$ mpirun -np 4 eztrace -t mpi ./stencil_mpi 1024
Each MPI process will generate a trace file in
/tmp/<username>_eztrace_log_rank_<MPI_rank>.
If you run the program on several nodes, you can retrieve the trace
files using
scp
.
In order to avoid retrieving the trace files after the execution, you
can set
EZTRACE_TRACE_DIR
to a directory that is shared
among the nodes of your cluster. For instance:
$ export EZTRACE_TRACE_DIR=$HOME/traces
$ mpirun -np 4 eztrace ./stencil_mpi 1024
[...]
Stopping EZTrace... saving trace
/home/${USER}/traces/${USER}_eztrace_log_rank_0
Stopping EZTrace... saving trace
/home/${USER}/traces/${USER}_eztrace_log_rank_2
Stopping EZTrace... saving trace
/home/${USER}/traces/${USER}_eztrace_log_rank_1
Stopping EZTrace... saving trace
/home/${USER}/traces/${USER}_eztrace_log_rank_3
Analyzing the execution of the application
Once the execution traces are generated, we need to analyze them.
EZTrace allows to:
- generate a PAJE/OTF trace file that can be visualized with tools
such
as ViTE or Vampir
- compute statistics
Visualizing an execution trace
In order to generate a PAJE/OTF trace file that can be visualized with
ViTE or
Vampir, you need to run
eztrace_convert
:
$ eztrace_convert -t PAJE $HOME/traces/${USER}_eztrace_log_rank_*
module stdio loaded
module pthread loaded
module omp loaded
module mpi loaded
module memory loaded
5 modules loaded
no more block for trace #2
no more block for trace #3
no more block for trace #1
no more block for trace #0
1570 events handled
This generates a file named
paje.trace
that you can
visualize with ViTE:
$ vite paje.trace
Computing statistics
EZTrace can compute statistics on MPI messages:
$ eztrace_stats $HOME/traces/${USER}_eztrace_log_rank_*
MPI:
---
MPI_RECV
:300 calls
MPI_ISEND
:300 calls
MPI_WAIT
:300 calls
MPI_BARRIER :8
calls
CT_Process #0 -- 50 messages
sent
Size of messages
(byte): min:
8192 max:
8192 average: 8192
total: 409600
Time spent sending messages
(ms): min: 25.654522 max:
37.878700 average: 33.762951 total:
1688.147534
Time spent computing while
sending messages (ms): min:
25.651473 max: 37.876794 average:
33.760435 total: 1688.021762
Time spent in MPI_Send or
waiting for a Isend to complete (ms): min: 0.001450 max:
0.005251 average:
0.002515 total: 0.125772
Time spent receiving
messages (ms): min: 0.010901 max:
11.508515 average: 3.747645
total: 187.382259
Time spent computing while
receiving messages (ms): min:
0.000000 max: 0.000000 average:
0.000000 total: 0.000000
Time spent in MPI_Recv or
waiting for a Irecv to complete (ms): min: 0.010901 max:
11.508515 average: 3.747645
total: 187.382259
CT_Process #1 -- 100 messages
sent
Size of messages
(byte): min:
8192 max:
8192 average: 8192
total: 819200
Time spent sending messages
(ms): min: 0.014730
max: 42.560793 average: 18.752866
total: 1875.286636
Time spent computing while
sending messages (ms): min:
0.013260 max: 42.557880 average:
18.750474 total: 1875.047353
Time spent in MPI_Send or
waiting for a Isend to complete (ms): min: 0.001031 max:
0.008158 average:
0.002393 total: 0.239283
Time spent receiving
messages (ms): min: 0.004429 max:
11.215598 average: 1.219087
total: 121.908687
Time spent computing while
receiving messages (ms): min:
0.000000 max: 0.000000 average:
0.000000 total: 0.000000
Time spent in MPI_Recv or
waiting for a Irecv to complete (ms): min: 0.004429 max:
11.215598 average: 1.219087
total: 121.908687
CT_Process #2 -- 100 messages
sent
Size of messages
(byte): min:
8192 max:
8192 average: 8192
total: 819200
Time spent sending messages
(ms): min: 0.018938
max: 42.488664 average: 18.103344
total: 1810.334430
Time spent computing while
sending messages (ms): min:
0.018449 max: 42.486583 average:
18.101278 total: 1810.127777
Time spent in MPI_Send or
waiting for a Isend to complete (ms): min: 0.000315 max:
0.036811 average:
0.002067 total: 0.206653
Time spent receiving
messages (ms): min: 0.005040 max:
14.437164 average: 2.907976
total: 290.797554
Time spent computing while
receiving messages (ms): min:
0.000000 max: 0.000000 average:
0.000000 total: 0.000000
Time spent in MPI_Recv or
waiting for a Irecv to complete (ms): min: 0.005040 max:
14.437164 average: 2.907976
total: 290.797554
CT_Process #3 -- 50 messages
sent
Size of messages
(byte): min:
8192 max:
8192 average: 8192
total: 409600
Time spent sending messages
(ms): min: 0.018112
max: 14.455367 average:
5.824193 total: 291.209661
Time spent computing while
sending messages (ms): min:
0.017621 max: 14.454577 average:
5.823531 total: 291.176530
Time spent in MPI_Send or
waiting for a Isend to complete (ms): min: 0.000404 max:
0.001063 average:
0.000663 total: 0.033131
Time spent receiving
messages (ms): min: 0.009640 max:
14.142448 average: 4.225155
total: 211.257759
Time spent computing while
receiving messages (ms): min:
0.000000 max: 0.000000 average:
0.000000 total: 0.000000
Time spent in MPI_Recv or
waiting for a Irecv to complete (ms): min: 0.009640 max:
14.142448 average: 4.225155
total: 211.257759
/tmp/eztrace_stats_${USER}/message_size.gp was created
eztrace_stats
prints:
- the number of calls to each MPI functions
- for each process: information on messages ( min/max/average size,
number of messages) as well as the time spent sending or receiving
messages
- the distribution of message sizes (in
/tmp/eztrace_stats_<username>
)
Dumping the messages
Additionnaly, you can get the list of all messages that were exchanged
during the execution by setting
EZTRACE_MPI_DUMP_MESSAGES
:
$ export EZTRACE_MPI_DUMP_MESSAGES=1
$ eztrace_stats $HOME/traces/${USER}_eztrace_log_rank_*
[...]
MPI messages dumped in
/tmp/eztrace_stats_${USER}/${USER}_eztrace_message_dump
[...]
$ cat /tmp/eztrace_stats_${USER}/${USER}_eztrace_message_dump
# src dest len
tag Isend_tick
SWait_tick Wait_done_tick
Irecv_tick
RWait_tick Wait_done_tick
Sender_request Receiver_request
1
0 8192
0
1685106493
1712167873
1712170464
1687482772
1687482772
1687497702
0x890960 (nil)
1
0 8192
0
1652218487
1685066389
1685068287
1657773287
1657773287
1657788052
0x890920 (nil)
1
0 8192
0
1619270156
1652179190
1652180986
1628295337
1628295337
1628310886
0x8908e0 (nil)
1
0 8192
0
1586632048
1619230569
1619232188
1598695549
1598695549
1598710239
0x8908a0 (nil)
1
0 8192
0
1548711295
1586588089
1586589995
1568940582
1568940582
1568955344
0x890860 (nil)
[...]
This file contains the list of messages that were exchanged during the
execution of the application. For each message, you have access to:
- src: the MPI rank of the process that sent the message
dest
: the MPI rank of the process that received the
message
len
: the length of the message
tag
: the tag of the message (in hexadecimal)
Isend_tick
: when mpi_isend occured or when mpi_send
began
Swait_tick
: when the sending process entered
mpi_wait or mpi_send
Wait_done_tick
: when the sending process left
mpi_wait or mpi_send
Irecv_tick
: when mpi_irecv occured or when mpi_recv
began
Rwait_tick
: when the receiving process entered
mpi_wait or mpi_recv
Wait_done_tick
: when the receiving process left
mpi_wait or mpi_recv
Sender_request
: the MPI_Request associated to the
message in the sending process (is nil if the process didn't use
mpi_isend)
Receiver_request
: the MPI_Request associated to the
message in the receiving process (is nil if the process didn't use
mpi_irecv)
This tutorial was designed for EZTrace 1.0 or higher. If
you encounter any problem, feel free to contact EZTrace developpers.