Understandably, efficient compilation contributes to performance improvement. Before you analyze your program for performance improvement, and improve program performance, you should think of efficient compilation itself. Based on the analysis of your application, you can decide which compiler optimizations and command-line options can improve the run-time performance of your application.
The efficient compilation techniques can be used during the earlier stages and later stages of program development.
During the earlier stages of program development, you can use incremental compilation without optimization. For example:
Platform |
Examples |
---|---|
Linux* |
ifort -c -g -O0 sub2.f90 (generates object file of sub2) ifort -c -g -O0 sub3.f90 (generates object file of sub3) ifort -o main -g -O0 main.f90 sub2.o sub3.o |
Windows* |
ifort /c /Zi /Od sub2.f90 !generates object file of sub2 ifort /c /Zi /Od sub3.f90 !generates object file of sub3 ifort /Femain.exe /Zi /Od main.f90 sub2.obj sub3.obj |
The above commands turn off all compiler default optimizations, for example, -O2 (Linux) or /O2 (Windows), with -O0 (Linux) or /Od (Windows). You can use the -g (Linux) or /Zi (Windows) option to generate symbolic debugging information and line numbers in the object code for all routines in the program for use by a source-level debugger. The main.exe file created in the third command above contains symbolic debugging information as well.
During the later stages of program development, you should specify multiple source files together and use an optimization level of at least -O2 (Linux) or /O2 (Windows) to allow more optimizations to occur. For instance, the following command compiles all three source files together using the default level of optimization, O2:
Platform |
Examples |
---|---|
Linux |
ifort -o main main.f90 sub2.f90 sub3.f90 |
Windows |
ifort /Femain.exe main.f90 sub2.f90 sub3.f90 |
Compiling multiple source files lets the compiler examine more code for possible optimizations, which results in:
Inlining more procedures
More complete data flow analysis
Reducing the number of external references to be resolved during linking
For very large programs, compiling all source files together may not be practical. In such instances, consider compiling source files containing related routines together using multiple ifort commands, rather than compiling source files individually.
The table below lists the options, in alphabetical order, that can directly improve run-time performance. Most of these options do not affect the accuracy of the results, while others improve run-time performance but can change some numeric results. The Intel Compiler performs some optimizations by default unless you turn them off by corresponding command-line options.
Additional optimizations can be enabled or disabled using command options.
Windows |
Linux |
Description |
---|---|---|
/align: |
-align keyword |
Analyzes and reorders memory layout for variables and arrays. Controls whether padding bytes are added between data items within common blocks, derived-type data, and record structures to make the data items naturally aligned. |
/G{n} |
-tpp{n} |
Optimizes your application's performance for specific Intel processors. See Targeting a Processor. |
/O1, /Ot |
-O1 |
Optimize to favor code size and code locality. See Setting Optimizations. |
/O2, /Ox |
-O2 |
Optimize for code speed. Sets performance-related options. See Setting Optimizations. |
/O3 |
-O3 |
Activates loop transformation optimizations. See Setting Optimizations. |
No equivalent |
-p |
Requests profiling information, which you can use to identify those parts of your program where improving source code efficiency would most likely improve run-time performance. After you modify the appropriate source code, recompile the program and test the run-time performance. |
/Ob2 + /Ot |
No equivalent |
Inlines procedures that will improve run-time performance with a likely significant increase in program size. |
/Ob2 +/Os |
No equivalent |
Inlines procedures that will improve run-time performance without a significant increase in program size. |
/Qax |
-ax |
Optimizes your application's performance for specific processors. Regardless of which suboption you choose, your application is optimized to use all the benefits of that processor with the resulting binary file capable of being run on any Intel IA-32 processor. |
/Qopenmp |
-openmp |
Enables the parallelizer to generate multithreaded code based on the OpenMP* directives. |
/Qparallel |
-parallel |
Enables the auto-parallelizer to generate multithreaded code for loops that can be safely executed in parallel. |
/Qunroll[n] /unroll:n |
-unrolln |
Specifies the number of times a loop is unrolled (n) when specified with optimization level -O3 (Linux) or /O3 (Windows) or higher. |
The table below lists options that can slow down run-time performance. Some applications that require floating-point exception handling or rounding might need to use the -fpen (Linux) or /fpen (Windows) option. Other applications might need to use the -assume dummy_aliases or -vms (Linux) or /assume:dummy_aliases or /Qvms (Windows) options for compatibility reasons.
Options that can slow down the run-time performance are primarily for troubleshooting or debugging purposes. The following table lists the options that can slow down run-time performance.
Windows |
Linux |
Description |
---|---|---|
/assume: |
-assume dummy_aliases |
Forces the compiler to assume that dummy (formal) arguments to procedures share memory locations with other dummy arguments or with variables shared through use association, host association, or common block use. These program semantics slow performance, so you should specify dummy_aliases only for the called subprograms that depend on such aliases. |
/c |
-c |
If you use this option when compiling multiple source files, also specify /Fooutputfile to compile many source files together into one object file. |
/check: |
-check bounds |
Generates extra code for array bounds checking at run time. |
/check: |
-check overflow |
Generates extra code to check integer calculations for arithmetic overflow at run time. Once the program is debugged, omit this option to reduce executable program size and slightly improve run-time performance. |
/fpe:3 |
-fpe 3 |
Using this option enables certain types of floating-point exception handling, which can be expensive. |
/Od |
-O0 |
Turns off optimizations. Can be used during the early stages of program development or when you use the debugger. |
/Qsave |
-save |
Forces the local variables to retain their values from the last invocation terminated. This may change the output of your program for floating-point values as it forces operations to be carried out in memory rather than in registers, which in turn causes more frequent rounding of your results. |
/Qvms |
-vms |
Controls certain VMS-related run-time defaults, including alignment. If you specify this option, you may need to also specify the -align records (Linux) or /align:records (Windows) option (for the ALIGN option) to obtain optimal run-time performance. |
/Zi, /Z7 |
-g
|
Generate extra symbol table information in the object file. Specifying these options also reduces the default level of optimization to no optimization. Note These options only slow your program when no optimization level is specified. For example, specifying -g, -O2 (Linux) or /Zi, /O2 (Windows) allows the code runs close to the speed that would result -g (Linux) or /Zi /Z7 (Windows) were not specified. |