<<

LASTPRIVATE
libraries
   Improving I/O Performance
   IPO Compilation Model
   Parallel Region Directives
   Timing your Application
    libintrins.lib
    OpenMP* run-time routines
       Intel Extension Routines/Functions
       OpenMP* Run-time Library Routines
library routines
    Intel extension
    OpenMP* run-time routines
    to control number of threads
little-endian-to-big-endian conversion
lock routines
LOOP COUNT
loop unrolling
   Compiler Directives Overview
   Vectorization Key Programming Guidelines
    limitations of
    support for
    using the HLO optimizer
       HLO Overview
       Optimizer Report Generation
loops
   Loop Constructs
   Loop Count and Loop Distribution
   Statements in the Loop Body
    arrays within
    blocking
       glossary
       Strip-mining and Cleanup
    body
    collapsing
    constructs
    count
       Loop Count and Loop Distribution
       Loop Unrolling Support
    data dependency
    dependencies
       Programming with Auto-parallelization
       Vectorization Support
    distribution
       HLO Overview
       Loop Count and Loop Distribution
       Loop Transformations
    exit conditions
    independence
    interchange
       Applying Optimization Strategies
       Coding Guidelines for IntelŪ Architectures
       HLO Overview
       Improving or Restricting FP Arithmetic Precision
       Loop Interchange and Subscripts: Matrix Multiply
       Loop Transformations
    manual transformation
    parallelization
       Loop Parallelization and Vectorization
       Parallelism Overview
       Programming with Auto-parallelization
    sectioning
    transformations
       Efficient Compilation
       HLO Overview
       Improving or Restricting FP Arithmetic Precision
       Loop Transformations
       Optimization Options Summary
       Strip-mining and Cleanup
    types vectorized
    unrolling
       Loop Unrolling
       Loop Unrolling Support
    using for arrays
    vectorization
    vectorized



maintainability
manual transformations
master thread
    copying data in
matrix multiplication
    example of
memory
    access
    allocation
    dependency
       Loop Transformations
       Memory Dependency with IVDEP Directive
    layout
memory aliasing
memory file system
misaligned data
mixing vectorizable types in a loop
MMX(TM)
multidimensional arrays
    using effectively
multifile IPO
    analyzing the effects of
    creating and using an executable for
       Command Line for Creating an IPO Executable
       Creating a Multifile IPO Executable
    optimization
    overview
multithreaded programs
   Auto-parallelization Overview
   Coding Guidelines for Intel Architectures
   Parallelism Overview
multithreading
   OpenMP* Support Libraries
   Programming with Auto-parallelization



natural alignment
naturally aligned
    data
       Alignment Options
       Efficient Compilation
    records
    storage
NOPREFETCH
NOSWP



obj files
   Command Line for Creating an IPO Executable
   Efficient Compilation
OMP directives
   Examples of OpenMP* Usage
   Parallelism Overview
   Programming with OpenMP*
OpenMP*
    directives
       Combined Parallel and Worksharing Constructs
       OpenMP* and Hyper-Threading Technology
       Parallel Region Directives
       Synchronization Constructs
       THREADPRIVATE Directive
       Worksharing Construct Directives
    environment variables
    Hyper-Threading Technology
    parallel processing thread model
    pragmas
    run-time library routines
    support libraries
OpenMP* Fortran directives
    clauses for
    examples of
    features of
    for synchronization
    for worksharing
    Intel extensions for
    programming using
    syntax of
optimal records to improve performance
optimization
    analyzing applications
    application-specific
    hardware-related
    library-related
    methodology
    options
        restricting
        setting
    OS-related
    reports
       Optimization Support Features Overview
       Optimizer Report Generation
       Pipelining for ItaniumŪ-based Applications
    strategies
    system-related
    targeting processors
    tuning tools
optimization support
optimizations
   Optimization Options Summary
   Optimizing Applications Overview
    compilation process
    default level of
    floating-point
    for specific processors
    helper thread
    high-level language
    interprocedural
    IPO
    multiple IPO
    options for IA-32
    options for ItaniumŪ architecture
    overview of
       Compiler Optimizations Overview
       Profile-guided Optimizations Overview
    parallelization
    PGO methodology
    profile-guided
    SSP
    support features for
optimizer report generation
optimizing
    applications
    helping the compiler
    overview
    technical applications
optimizing performance
options for efficient compilation
ORDERED
   OpenMP Directives and Clauses Summary
   Synchronization Constructs
   Worksharing Construct Directives
overflow
   Floating-point Options for Multiple Architectures
   Stacks: Automatic Allocation and Checking
overriding
    call to a runtime library routine
    loop unrolling
    software pipelining
    the threads number
    vectorization
overview
    of data scope attribute clauses
    of optimizing compilation
    of optimizing different application types
    of optimizing for specific processors
    of parallelism
    of programming for high performance



packed structures
PARALLEL
parallel construct
PARALLEL DO
   Combined Parallel and Worksharing Constructs
   COPYIN Clause
   DEFAULT Clause
   OpenMP* Directives and Clauses Summary
   PRIVATE, FIRSTPRIVATE, and LASTPRIVATE Clauses
   REDUCTION Clause
   SHARED Clause
   Synchronization Constructs
    SCHEDULE clause
parallel invocations with makefile
   Basic PGO Options
   Creating Multifile IPO Executable with Makefile
PARALLEL OpenMP* Fortran directive
   COPYIN Clause
   DEFAULT Clause
   OpenMP* Directives and Clauses Summary
   PRIVATE, FIRSTPRIVATE, and LASTPRIVATE Clauses
   REDUCTION Clause
   SHARED Clause
   Synchronization Constructs
parallel processing
    thread model
parallel programming
   Optimizing Applications Overview
   Parallelism Overview
parallel regions
    directive defining
    directives affecting
    library routine affecting
PARALLEL SECTIONS
   Combined Parallel and Worksharing Constructs
   COPYIN Clause
   DEFAULT Clause
   OpenMP* Directives and Clauses Summary
   PRIVATE, FIRSTPRIVATE, and LASTPRIVATE Clauses
   REDUCTION Clause
   SHARED Clause
   Synchronization Constructs
PARALLEL WORKSHARE
parallelism
   Auto-parallelization Overview
   OpenMP* Run-time Library Routines
   Parallelism: an Overview
   Parallelization with OpenMP* Overview
parallelization
   Auto-parallelization Overview
   Loop Parallelization and Vectorization
   Parallelization Overview
   Parallelization with OpenMP* Overview
   Programming with Auto-parallelization
    diagnostic
passing
   Implementing IL Files with Version Numbers
   Using Arrays Efficiently
    array arguments efficiently
    options to other tools
performance analyzer
   Parallelization with OpenMP* Overview
   Timing your Application
performance issues with IPO
PGO
   Profile-guided Optimizations Methodology and Usage Model
   Profile-guided Optimizations Overview
PGO tools
    code-coverage tool
    helper threads
    profmerge
    proforder
    profrun
    software-based precomputation
    test-prioritization tool
pgopti.dpi file
   Basic PGO Options
   PGO Environment Variables
pgopti.spi file
   Code-coverage Tool
   Profile-guided Optimizations Methodology and Usage Model
   Test-prioritization Tool
pgouser.h header file
pipelining
    affect of LOOP COUNT on
    for ItaniumŪ-based applications
pointer aliasing
PREFETCH
    options used for
prefetches of data
    optimizations for
preloading
preparing code
prioritizing application tests
PRIVATE
   Data Scope Attribute Clauses Overview
   OpenMP Directives and Clauses Summary
   Parallel Region Directives
   PRIVATE, FIRSTPRIVATE, and LASTPRIVATE Clauses
   REDUCTION Clause
   Worksharing Construct Directives
    in DEFAULT clause
processor
    achieving optimum performance for
    optimizing for specific
       Optimizing for Specific Processors Overview
       Processor-Specific Optimization (IA-32 only)
    run-time checks for IA-32 systems
    targeting
processor-based optimizations
processor-specific runtime checks
processors
    targeting IA-32 using options
    targeting ItaniumŪ processors using options
PROF_DIR environment variable
PROF_DUMP_INTERVAL environment variable
PROF_NO_CLOBBER environment variable
profile-guided optimization
   Profile-guided Optimizations Methodology and Usage Model
   Profile-guided Optimizations Overview
    API support
    dumping profile information
    environment variables
    example of
    interval profile dumping
    methodology
    options
       Advanced PGO Options
       Basic PGO Options
    overview
    phases
    resetting dynamic profile counters
    resetting profile information
    support
    usage model
profile-optimized code
   Basic PGO Options
   PGO API Support Overview
profile data
    dumping
       Dumping Profile Information
       Interval Profile Dumping
    resetting dynamic counters for
profiling
    generating information
    specifying a summary
profmerge
    code-coverage tool
profrun
    .hpi file
    .tb5 file
    requirements
    SSP
program loops
programs
    high performance
    interprocedural optimization of
pseudo code
    parallel processing model


>>