Debugging Parallel Regions

The compiler implements a parallel region by enabling the code in the region and putting it into a separate, compiler-created entry point. Although this is different from outlining – the technique employed by other compilers, that is, creating a subroutine, – the same debugging technique can be applied.

Constructing an Entry-point Name

The compiler-generated parallel region entry point name is constructed with a concatenation of the following strings:

"__" character
entry point name for the original routine (for example, _parallel)
"_" character
line number of the parallel region
__par_region for OpenMP parallel regions (!$OMP PARALLEL)
__par_loop for OpenMP parallel loops (!$OMP PARALLEL DO),
__par_section for OpenMP parallel sections (!$OMP PARALLEL SECTIONS)

sequence number of the parallel region (for each source file, sequence number starts from zero.)

When you use routine names (for example, padd) and entry names (for example, _PADD, ___PADD_6__par_loop0), the following occurs. The Fortran Compiler, by default, first changes lower/mixed case routine names to upper case. For example, pAdD becomes PADD, and this becomes the entry name by adding one underscore. The secondary entry name change happens after that. That's why the "__par_loop" part of the entry name stays as lower case.

Note

The debugger doesn't accept the upper case routine name "PADD" to set the breakpoint. Instead, it accepts the lower case routine name "padd".

Example 1 shows the debugging of the code with a parallel region. Example 1 is produced by this command:

ifort -openmp -g -O0 -S file.f90

Let us consider the code of subroutine parallel in Example 1.

Subroutine PARALLEL() source listing

1    subroutine parallel
2    integer id,OMP_GET_THREAD_NUM
3 !$OMP PARALLEL PRIVATE(id)
4    id = OMP_GET_THREAD_NUM()
5 !$OMP END PARALLEL
6    end

The parallel region is at line 3. The compiler created two entry points: parallel_ and ___parallel_3__par_region0. The first entry point corresponds to the subroutine parallel(), while the second entry point corresponds to the OpenMP parallel region at line 3.

Example 1 Debugging Code with Parallel Region

Machine Code Listing of the Subroutine parallel()

        .globl parallel_
parallel_:
..B1.1:                    # Preds ..B1.0
..LN1:
pushl     %ebp                                    #1.0
movl      %esp, %ebp                              #1.0
subl      $44, %esp                               #1.0
pushl     %edi                                    #1.0
... ... ... ... ... ... ... ... ... ... ... ... ...
..B1.13:                    # Preds ..B1.9
addl      $-12, %esp                             #6.0
movl      $.2.1_2_kmpc_loc_struct_pack.2, (%esp) #6.0
movl      $0, 4(%esp)                            #6.0
movl      $_parallel__6__par_region1, 8(%esp)    #6.0
call      __kmpc_fork_call                       #6.0
# LOE
..B1.31:                    # Preds ..B1.13
addl      $12, %esp                              #6.0
# LOE
..B1.14:                    # Preds ..B1.31 ..B1.30
..LN4:
leave                                            #9.0
ret                                              #9.0
# LOE

.type parallel_,@function
.size parallel_,.-parallel_
.globl _parallel__3__par_region0
_parallel__3__par_region0:
# parameter 1: 8 + %ebp
# parameter 2: 12 + %ebp
..B1.15:                    # Preds ..B1.0
pushl     %ebp                                   #9.0
movl      %esp, %ebp                             #9.0
subl      $44, %esp                              #9.0
..LN5:
call      omp_get_thread_num_                    #4.0
# LOE eax
..B1.32:                    # Preds ..B1.15
movl      %eax, -32(%ebp)                        #4.0
# LOE
..B1.16:                    # Preds ..B1.32
movl      -32(%ebp), %eax                        #4.0
movl      %eax, -20(%ebp)                        #4.0
..LN6:
leave                                            #9.0
ret                                              #9.0
# LOE
.type _parallel__3__par_region0,@function
.size _parallel__3__par_region0,._parallel__3__par_region0
.globl _parallel__6__par_region1
_parallel__6__par_region1:
# parameter 1: 8 + %ebp
# parameter 2: 12 + %ebp
..B1.17:                     # Preds ..B1.0

pushl     %ebp                                   #9.0
movl      %esp, %ebp                             #9.0
subl      $44, %esp                              #9.0
..LN7:
call      omp_get_thread_num_                    #7.0
# LOE eax
..B1.33:                    # Preds ..B1.17
movl      %eax, -28(%ebp)                        #7.0
# LOE
..B1.18:                    # Preds ..B1.33
movl      -28(%ebp), %eax                        #7.0
movl      %eax, -16(%ebp)                        #7.0
..LN8:
leave                                            #9.0
ret                                              #9.0
.align    4,0x90
# mark_end;

Debugging the program at this level is just like debugging a program that uses POSIX threads directly. Breakpoints can be set in the threaded code just like any other routine. With the Intel® Debugger (idb) or the GNU debugger, breakpoints can be set to source-level routine names (such as parallel). Breakpoints can also be set to entry point names (such as parallel_ and _parallel__3__par_region0). Note that the Intel® Fortran Compiler for Linux* converted the upper case Fortran subroutine name to the lower case one.