Understanding Profile-Guided Optimization

PGO works best for code with many frequently executed branches that are difficult to predict at compile time. An example is the code with intensive error-checking in which the error conditions are false most of the time. The "cold" error-handling code can be placed such that the branch is rarely predicted incorrectly. Minimizing "cold" code interleaved into the "hot" code improves instruction cache behavior.

Also, the use of PGO often enables the compiler to make better decisions about function inlining, thereby increasing the effectiveness of IPO.

PGO Phases

The PGO methodology requires the following three phases and options:

The flowchart (below) illustrates this process for IA-32 compilation and ItaniumŪ-based compilation.

A key factor in deciding whether you want to use PGO lies in knowing which sections of your code are the most heavily used. If the data set provided to your program is very consistent and it elicits a similar behavior on every execution, then PGO can probably help optimize your program execution. However, different data sets can elicit different algorithms to be called. This can cause the behavior of your program to vary from one execution to the next.

Phases of Basic Profile Guide Optimization

See Example of Profile-Guided Optimization for specific details on working with each phase.

PGO Usage Model

The following figure illustrates the PGO usage model.

Here are the steps for a simple example (myApp.f90) for IA-32 systems.

  1. Set the following:

    prof-dir=c:\myApp\prof-dir

  2. Issue the following command (depending on the platform):

    ifort -prof-genx myApp.f90
    (Linux)
    ifort /Qprof-genx myApp.f90
    (Windows)

    This command compiles the program and generates instrumented binary myApp.exe as well as the corresponding static profile information pgopti.spi.

  1. Execute myApp.

    Each invocation of myApp runs the instrumented application and generates one or more new dynamic profile information files that have an extension .dyn in the directory specified by prof-dir.

  2. Issue the following command myApp.

    ifort -prof-use myApp.f90
    (Linux)
    ifort /Qprof-use myApp.f90
    (Windows)

    At this step, the compiler merges all the .dyn files into one .dpi file representing the total profile information of the application and generates the optimized binary. The default name of the .dpi file is pgopti.dpi.