Tuesday, 31 May 2016

Open source Fortran parallel debugging

If you develop parallel code in Fortran, your options for parallel debuggers are not that many. There are a some very good commercial parallel debuggers (mainly TotalView and DDT), and if you are using any decent-size supercomputer to run your code, chances are that these are already installed in the machine.

But from time to time I need to be able to debug code on my Linux workstation while developing new Fortran code. We do have a license for the Intel Fortran Compiler, and in previous versions this shipped with a graphical debugger (IDB) which was very nice for serial applications, but they stopped shipping it around 2013, so I decided to look for an alternative, with GDB.

Before we go for parallel debugging, let's go first for serial code debugging.

Fortran + GDB (serial code)

The issue with GDB is that it doesn't play nicely with Fortran. Let's see an example with the following code:

My current setting is:
* Distribution: Linux  4.2.3-200.fc22.x86_64
* gfortran: GNU Fortran (GCC) 5.1.1 20150618 (Red Hat 5.1.1-4)                                        
* gdb: GNU gdb (GDB) Fedora 7.9.1-20.fc22

I'm able to look at the array datos, put I cannot look for subarrays, like datos(1,:,:), the pointer pdatos is OK if viewed in full, but I can't again look for subarrays of it, like pdatos(1,:)

So we will need some modified version of gdb that plays nicely with Fortran. One possible solution is to use a gdb obtained from Archer (git) http://sourceware.org/gdb/wiki/ArcherBranchManagement, branch archer-jankratochvil-vla, though I haven't used that one and I don't know how it plays along with Fortran.

Another solution is to use the modified version of gdb that comes with the Intel compiler: gdb-ia (I'm not sure if one can get gdb-ia as a standalone download, without the need to get an Intel compiler license).

With our current Intel Compiler version (2016.1.150), the versions of ifort and gdb-ia are:

* ifort: ifort (IFORT) 16.0.1 20151021
* gdb-ia: GNU gdb (GDB) 7.8-16.0.558

With these settings, if we try to use the Intel compiler provided and then debug with gdb-ia, things don't work poperly. Access to the array "datos" seems OK, but if we try to access it via the pointer "pdatos" we don't get it to work:

In principle you can access to any data if you know your way around pointers and you could use syntax like

(gdb) p *((real *)my_heap + 2)

(see http://numericalnoob.blogspot.com.es/2012/08/fortran-allocatable-arrays-and-pointers.html for examples and explanations), but this quickly becomes very cumbersome.

But if we compile with gfortran and then use gdb-ia to debug the code, then allocatable arrays, pointer to them, and subarrays of them seem to work no problem:

Fortran + GDB (parallel code)

So now that we have a working environment for serial code, we need the jump to be able to debug parallel code. GDB is not designed to work in parallel, so we need some workaround to make it a viable platform to debug in parallel.

The usual advice is to run a variant of the following:

mpirun -np xterm -e gdb ./program

so, for example, if we are running our program with 4 processors, then 4 xterms will open, and in each of them we will have a gdb session debugging one of the MPI rank processes. The problem with this is, obviously, that we will have to go through each xterm to advance through the code, and soon this will become very cumbersome, due to having to change from window to window all the time and also because all the xterms will take up too much screen space.

So I wanted to find a solution that is more convenient (in terms of not having to replicate all the gdb commands in all windows) and also that can make better use of the available screen space.

First attempt

My first attempt involved the many xterms method above, but then with two improvements:

  1. I would use x-tile (http://www.giuspen.com/x-tile/) to automatically tile all the xterms and maximize their use of screen space.
  2. I would use keyboardcast (https://launchpad.net/keyboardcast) in order to control all the xterms from one single application.

This was more ore less OK as I was testing this on a PC with Ubuntu on it, but for other distributions keyboardcast seems to have a lot of dependencies (the source code can be downloaded from archive.ubuntu.com/ubuntu/pool/universe/k/keyboardcast/keyboardcast_0.1.1.orig.tar.gz), and also I could not use it for remote machines, since keyboardcast only knows about X applications running locally (or at least I couldn't find a way to control terminals launched in a remote server to which I had connected with ssh -X)

Second attempt

So I looked for another solution, one which I could use remotely and which didn't depend on installing packages with many external dependencies. A semi-decent solution that I found was to submit the mpirun job in a remote server where every process is sent to its own screen running gdb-ia (screen as in http://linux.die.net/man/1/screen) and then remotely use terminator (https://launchpad.net/terminator/http://gnometerminator.blogspot.com.es/p/introduction.html) to connect to those running screens, with the added benefit that I can control all the gdb sessions simultaneously and, thanks to screen, I can even stop debugging in one machine, go to another one and continue debugging in the place where I left it off.

So, let's see the details. Let's assume the following simple Fortran+MPI code, a variation on the serial code above:

Which I compile with gfortran_5.1.1 and its derived mpif90 (with library version OpenMPI openmpi-1.10.2 in this case, though that version of MPI should not matter in principle) in remote server "duna", which is the same FC22 machine where I was doing the serial tests above.

mpif90 -g -o test_mpi_gfortran test_mpi.F90

And (as suggested in https://bfroehle.com/2011/09/14/debugging-mpi-python/), I launch it as:

mpirun -np 4 screen -L -m -D -S mpi env LD_LIBRARY_PATH=$LD_LIBRARY_PATH gdb-ia -tui ./test_mpi_gfortran

(you can include the & in the end if you want to get back the terminal at the remote server, but I prefer it like this, so when I finish the debugging session I can just Ctrl-C this terminal and I will not leave any leftover processes hanging around).

That line has created 4 screens sessions, and in each one a gdb-ia process will be running. So now it is time to connect to them, and I can easily do it from my client workstation (in this particular case running Ubuntu 14.04).

  • I start terminator, and create 4 tabs. Then, from the dropdown menu I select "Broadcast all", and then ssh to the remote server (just doing it in one of the tabs will replicate all the keystrokes to the other tabs, so the four terminals will connect to the remote server). 
  • Then we need to connect each of the terminals to one of the screen sessions. 
    • If I use (as suggested in https://bfroehle.com/2011/09/14/debugging-mpi-python/gnome-terminal, then I have the same issue as before, that I will not be able to control all of them at the same time. 
    • If from terminator (while "broadcasting all" is still active) I type "screen -RR -p mpi" in one of the terminals, then it looks like all of them connect to the same screen session, which we obviously don't want.
    • For the moment, an ugly hack (let me know if you have a better idea) is to make each of the terminals wait some random seconds, which we can do in bash with:
         sleep $[ ( $RANDOM % 20 ) + 1 ] ; screen -RR -p mpi

This is obviously not very robust, so I should look for a better way, but for the moment it will make sure that each of the terminal tabs will connect to the screen session with some interval of time between them, which works most of the time (if when you start typing anything you see that a keystroke shows more than once, then it means that some terminals tried to connect simultaneously to the same screen session giving trouble, so you should redo, perhaps with a longer sleep time.

Now, terminator is very powerful, and if you prefer to have dettached tabs to see simultaneously what is going on in each processor, you can definitely do it. For example see  http://unix.stackexchange.com/questions/89339/how-do-i-run-the-same-linux-command-in-more-than-one-tab-shell-simultaneously for an example of running a grid of 8x4 terminals using terminator.

So now, if you know your way around the TUI interface, you can just control all the processors at once, or just one by one (by selecting "Broadcast none"), and you will be able to inspect properly allocatable arrays, pointers, etc.

With Emacs + GDB integration

I don't like the TUI interface that much, and I would like to use Emacs GDB mode instead, but this version of gdb-ia doesn't play very nicely with Emacs, and on calling gdb from within Emacs, I get the following error:

~$ Error: you did not specify -i=mi on GDB's command line!

To solve the issue (I've been told that this won't be necessary in future releases of gdb-ia) we need to create a wrapper script (let's call it gdb_wrap.sh):

And now, for the final touch, in the remote server we just define another script (let's call it edbg):

So now in the remote server we can do:
mpirun -np 4 screen -L -m -D -S mpi env LD_LIBRARY_PATH=$LD_LIBRARY_PATH edbg ./test_mpi_gfortran

This will do the same as before, but instead of launching 4 gdb's in the remote server with the TUI interface, we will have four Emacs (one for each MPI process) and each one with its GDB interface (which is quite a usable interface if we run gdb-many-windows).

As an example, you can see a very simple debugging session in the following video, where I start a 4-processor job with mpirun in the remote server "duna", and then at my "carro" workstation I launch terminator with 4 terminals, which I control all at the same time thanks to "Broadcast all" option, and as we can see towards the end, each terminal is running a different process, and you can see that when I print "my_id" or the contents of the "pdatos(1,:)" pointer array each process shows its own contents.

Any comments/suggestions to make the debugging session more comfortable/useful are very welcome.

No comments: