Tuesday, 20 July 2010

Debugging Fortran+MPI codes with "The Portland Group" debugger (pgdbg)

I have to admit that I didn't get the Fox algorithm in my previous post correct the very first time, so I had to do some parallel debugging. In the past, I have used gdb, and the Intel Debugger, and now it was time to try pgdbg (The Portland Group debugger). Next in line is TotalView.

At our institution the Portland compiler's version installed is 10.5, but this had some issues with my current workstation linux distribution (Ubuntu 10.04 64 bits), so I installed the current latest version: 10.6. Installation was very simple with their install script, and once the license file was in place it was time to try it.

Compilation of the code can be done with the included MPICH1, with the following command:

angelv@vaso:~/fox$ pgf90 -o fox -Mmpi=mpich1 -g fox.f90

Since we will be using ssh to connect to the other processors (actually just a number of processes all running in my local workstation), we need to first get the ssh security sorted out (tips from the "PGI Tools Guide" documentation, page. 90). We generate the ssh keys with a passphrase (and copy them to the authorized keys):

$ ssh-keygen -t dsa

$ cd $HOME/.ssh
$ cp id_dsa.pub authorized_keys

And then, from a new terminal we will just have to do the following, and enter the passphrase just once, and all subsequent ssh connections will be passwordless:

$ eval `ssh-agent -s`
$ ssh-add

With this in place, we can run our code with the included MPICH1 version:

angelv@vaso:~/fox$ mpirun -stdin fox.in -np 4 ./fox

In order to run it with the debugger, we just add the option -dbg=pgdbg:

angelv@vaso:~/fox$ mpirun -stdin fox.in -dbg=pgdbg -np 4 ./fox

The following image shows a moment during the debugging session, where 4 processes have been created, and we are at the end of the first stage in the Fox algorithm. The window in the bottom shows how you can easily see the values of variables (whole matrixes included, which can be indexed according to Fortran syntax) for all (or a selection of) processes involved in the computation.

I need to try it for a longer period, but overall it looks like a very usable parallel debugger. The Portland Group has a video demo of the debugger here.

No comments: