HLO Overview

High-level optimizations (HLO) exploit the properties of source code constructs (for example, loops and arrays) in the applications developed in high-level programming languages, such as Fortran. The high-level optimizations include loop interchange, loop fusion, loop unrolling, loop distribution, unroll-and-jam, blocking, data prefetch, scalar replacement, data layout optimizations and loop unrolling techniques.

The option required to turn on the high-level optimizations is -O3 (Linux*) or /O3 (Windows*); the scope of optimizations turned on by the option is different for IA-32 and ItaniumŪ-based applications. See Setting Optimization Levels.

IA-32 and ItaniumŪ-based Applications

The -O3 (Linux) or /O3 (Windows) option enables the -O2 (Linux) or /O2 (Windows) option and adds more aggressive optimizations; for example, loop transformation and prefetching. -O3 (Linux) or /O3 (Windows) optimizes for maximum speed, but may not improve performance for some programs.

IA-32 Applications

In conjunction with the vectorization options, -ax and -x (Linux) or /Qax and /Qx (Windows), the -O3 (Linux) or /O3 (Windows) option causes the compiler to perform more aggressive data dependency analysis than the default -O2 (Linux) or /O2 (Windows). This may result in longer compilation times.

Tuning Itanium-based Applications

The -ivdep-parallel (Linux) or /Qivdep-parallel (Windows) option asserts there is no loop-carried dependency in the loop where an IVDEP directive is specified. This is useful for sparse matrix applications.

Follow these steps to tune applications on ItaniumŪ-based systems:

  1. Compile your program with -O3 (Linux) or /O3 (Windows) and -ipo (Linux) or /Qipo (Windows).  Use profile guided optimization whenever possible.

  2. Identify hot spots in your code.  

  3. Turn on Optimization reporting.  

  4. Check why loops are not software pipelined:

  1. Check that the prefetch distance is correct. Use CDEC$ prefetch to override the distance when it is needed.