LASTPRIVATE
libraries
Improving I/O Performance
IPO Compilation Model
Parallel Region Directives
Timing your Application
libintrins.lib
OpenMP* run-time routines
Intel Extension Routines/Functions
OpenMP* Run-time Library Routines
library routines
Intel extension
OpenMP* run-time routines
to control number of threads
little-endian-to-big-endian conversion
lock routines
LOOP COUNT
loop unrolling
Compiler Directives Overview
Vectorization Key Programming Guidelines
limitations of
support for
using the HLO optimizer
HLO Overview
Optimizer Report Generation
loops
Loop Constructs
Loop Count and Loop Distribution
Statements in the Loop Body
arrays within
blocking
glossary
Strip-mining and Cleanup
body
collapsing
constructs
count
Loop Count and Loop Distribution
Loop Unrolling Support
data dependency
dependencies
Programming with Auto-parallelization
Vectorization Support
distribution
HLO Overview
Loop Count and Loop Distribution
Loop Transformations
exit conditions
independence
interchange
Applying Optimization Strategies
Coding Guidelines for IntelŪ Architectures
HLO Overview
Improving or Restricting FP Arithmetic Precision
Loop Interchange and Subscripts: Matrix Multiply
Loop Transformations
manual transformation
parallelization
Loop Parallelization and Vectorization
Parallelism Overview
Programming with Auto-parallelization
sectioning
transformations
Efficient Compilation
HLO Overview
Improving or Restricting FP Arithmetic Precision
Loop Transformations
Optimization Options Summary
Strip-mining and Cleanup
types vectorized
unrolling
Loop Unrolling
Loop Unrolling Support
using for arrays
vectorization
vectorized
maintainability
manual transformations
master thread
copying data in
matrix multiplication
example of
memory
access
allocation
dependency
Loop Transformations
Memory Dependency with IVDEP Directive
layout
memory aliasing
memory file system
misaligned data
mixing vectorizable types in a loop
MMX(TM)
multidimensional arrays
using effectively
multifile IPO
analyzing the effects of
creating and using an executable for
Command Line for Creating an IPO Executable
Creating a Multifile IPO Executable
optimization
overview
multithreaded programs
Auto-parallelization Overview
Coding Guidelines for Intel Architectures
Parallelism Overview
multithreading
OpenMP* Support Libraries
Programming with Auto-parallelization
natural alignment
naturally aligned
data
Alignment Options
Efficient Compilation
records
storage
NOPREFETCH
NOSWP
obj files
Command Line for Creating an IPO Executable
Efficient Compilation
OMP directives
Examples of OpenMP* Usage
Parallelism Overview
Programming with OpenMP*
OpenMP*
directives
Combined Parallel and Worksharing Constructs
OpenMP* and Hyper-Threading Technology
Parallel Region Directives
Synchronization Constructs
THREADPRIVATE Directive
Worksharing Construct Directives
environment variables
Hyper-Threading Technology
parallel processing thread model
pragmas
run-time library routines
support libraries
OpenMP* Fortran directives
clauses for
examples of
features of
for synchronization
for worksharing
Intel extensions for
programming using
syntax of
optimal records to improve performance
optimization
analyzing applications
application-specific
hardware-related
library-related
methodology
options
restricting
setting
OS-related
reports
Optimization Support Features Overview
Optimizer Report Generation
Pipelining for ItaniumŪ-based Applications
strategies
system-related
targeting processors
tuning tools
optimization support
optimizations
Optimization Options Summary
Optimizing Applications Overview
compilation process
default level of
floating-point
for specific processors
helper thread
high-level language
interprocedural
IPO
multiple IPO
options for IA-32
options for ItaniumŪ architecture
overview of
Compiler Optimizations Overview
Profile-guided Optimizations Overview
parallelization
PGO methodology
profile-guided
SSP
support features for
optimizer report generation
optimizing
applications
helping the compiler
overview
technical applications
optimizing performance
options for efficient compilation
ORDERED
OpenMP Directives and Clauses Summary
Synchronization Constructs
Worksharing Construct Directives
overflow
Floating-point Options for Multiple Architectures
Stacks: Automatic Allocation and Checking
overriding
call to a runtime library routine
loop unrolling
software pipelining
the threads number
vectorization
overview
of data scope attribute clauses
of optimizing compilation
of optimizing different application types
of optimizing for specific processors
of parallelism
of programming for high performance
packed structures
PARALLEL
parallel construct
PARALLEL DO
Combined Parallel and Worksharing Constructs
COPYIN Clause
DEFAULT Clause
OpenMP* Directives and Clauses Summary
PRIVATE, FIRSTPRIVATE, and LASTPRIVATE Clauses
REDUCTION Clause
SHARED Clause
Synchronization Constructs
SCHEDULE clause
parallel invocations with makefile
Basic PGO Options
Creating Multifile IPO Executable with Makefile
PARALLEL OpenMP* Fortran directive
COPYIN Clause
DEFAULT Clause
OpenMP* Directives and Clauses Summary
PRIVATE, FIRSTPRIVATE, and LASTPRIVATE Clauses
REDUCTION Clause
SHARED Clause
Synchronization Constructs
parallel processing
thread model
parallel programming
Optimizing Applications Overview
Parallelism Overview
parallel regions
directive defining
directives affecting
library routine affecting
PARALLEL SECTIONS
Combined Parallel and Worksharing Constructs
COPYIN Clause
DEFAULT Clause
OpenMP* Directives and Clauses Summary
PRIVATE, FIRSTPRIVATE, and LASTPRIVATE Clauses
REDUCTION Clause
SHARED Clause
Synchronization Constructs
PARALLEL WORKSHARE
parallelism
Auto-parallelization Overview
OpenMP* Run-time Library Routines
Parallelism: an Overview
Parallelization with OpenMP* Overview
parallelization
Auto-parallelization Overview
Loop Parallelization and Vectorization
Parallelization Overview
Parallelization with OpenMP* Overview
Programming with Auto-parallelization
diagnostic
passing
Implementing IL Files with Version Numbers
Using Arrays Efficiently
array arguments efficiently
options to other tools
performance analyzer
Parallelization with OpenMP* Overview
Timing your Application
performance issues with IPO
PGO
Profile-guided Optimizations Methodology and Usage Model
Profile-guided Optimizations Overview
PGO tools
code-coverage tool
helper threads
profmerge
proforder
profrun
software-based precomputation
test-prioritization tool
pgopti.dpi file
Basic PGO Options
PGO Environment Variables
pgopti.spi file
Code-coverage Tool
Profile-guided Optimizations Methodology and Usage Model
Test-prioritization Tool
pgouser.h header file
pipelining
affect of LOOP COUNT on
for ItaniumŪ-based applications
pointer aliasing
PREFETCH
options used for
prefetches of data
optimizations for
preloading
preparing code
prioritizing application tests
PRIVATE
Data Scope Attribute Clauses Overview
OpenMP Directives and Clauses Summary
Parallel Region Directives
PRIVATE, FIRSTPRIVATE, and LASTPRIVATE Clauses
REDUCTION Clause
Worksharing Construct Directives
in DEFAULT clause
processor
achieving optimum performance for
optimizing for specific
Optimizing for Specific Processors Overview
Processor-Specific Optimization (IA-32 only)
run-time checks for IA-32 systems
targeting
processor-based optimizations
processor-specific runtime checks
processors
targeting IA-32 using options
targeting ItaniumŪ processors using options
PROF_DIR environment variable
PROF_DUMP_INTERVAL environment variable
PROF_NO_CLOBBER environment variable
profile-guided optimization
Profile-guided Optimizations Methodology and Usage Model
Profile-guided Optimizations Overview
API support
dumping profile information
environment variables
example of
interval profile dumping
methodology
options
Advanced PGO Options
Basic PGO Options
overview
phases
resetting dynamic profile counters
resetting profile information
support
usage model
profile-optimized code
Basic PGO Options
PGO API Support Overview
profile data
dumping
Dumping Profile Information
Interval Profile Dumping
resetting dynamic counters for
profiling
generating information
specifying a summary
profmerge
code-coverage tool
profrun
.hpi file
.tb5 file
requirements
SSP
program loops
programs
high performance
interprocedural optimization of
pseudo code
parallel processing model