Using Tuning Tools and Strategies

Identifying and Analyzing Hotspots

To identify opportunities for optimization, you should start with the time-based and event-based sampling functionality of the VTune™ Analyzer. Time-based sampling identifies the sections of code that use the most processor time. Event-based sampling identifies microarchitecture bottlenecks such as cache misses and mispredicted branches. Clock ticks and instruction count are good counters to use initially to identify specific functions of interest for tuning.

Once specific functions have been identified through sampling, you should use call-graph analysis to provide thread-specific reports on each. Call-graph analysis returns the following information about the functions:

You can also use the Counter monitor (which is equivalent to Microsoft* Perfmon*) to provide real-time performance data based on more than 200 possible operating system counters, or on custom counters created for specific environments and tasks.

You can use Intel® Tuning Assistant and Intel® Thread Checker, which ship as part of the VTune™ Performance Tools. The Intel® Tuning Assistant  interprets data generated by the VTune™ Performance Tools and generates application-specific tuning advice based on that information. Intel® Thread Checker provides insight into the correctness of the threading methodology that has been applied to an application, identifying specific threading issues that should be addressed to improve performance.

See Using Intel Performance Analysis Tools for more information about these tools.

Using the Intel® compilers for Tuning

The compilers provide advanced optimization features for Intel processors, which make them a highly efficient and cost-effective means of improving performance for Intel® architecture. All compilers support Processor Dispatch, making it possible for a single executable to run optimally on the most current Intel microarchitectures and on legacy processors. The following options are useful when tuning:

See Optimizations Option Summary and Optimizing for Specific Processors Overview to get more information about the options listed above.

All Intel compilers include full support for auto-parallelization and substantial support for the OpenMP Fortran version 2.0 specification.