The auto-parallelization feature implements some concepts of OpenMP*, such as the worksharing construct (with the PARALLEL DO directive). See Programming with OpenMP for worksharing construct. This section provides details on auto-parallelization.
A loop can be parallelized if:
The loop is countable at compile time: this means that an expression representing how many times the loop will execute (also called "the loop trip count") can be generated just before entering the loop.
There are no FLOW (READ after WRITE), OUTPUT (WRITE after WRITE) or ANTI (WRITE after READ) loop-carried data dependencies. A loop-carried data dependency occurs when the same memory location is referenced in different iterations of the loop. At the compiler's discretion, a loop may be parallelized if any assumed inhibiting loop-carried dependencies can be resolved by run-time dependency testing.
The compiler may generate a run-time test for the profitability of executing in parallel for loop with loop parameters that are not compile-time constants.
Enhance the power and effectiveness of the auto-parallelizer by following these coding guidelines:
For auto-parallelization processing, the compiler performs the following steps:
Data flow analysis: Computing the flow of data through the program.
Loop classification: Determining loop candidates for parallelization based on correctness and efficiency, as shown by threshold analysis.
dependency analysis: Computing the dependency analysis for references in each loop nest.
High-level parallelization: Analyzing dependency graph to determine loops which can execute in parallel, and computing run-time dependency.
Data partitioning: Examining data reference and partition based on the following types of access: SHARED, PRIVATE, and FIRSTPRIVATE.
Multi-threaded code generation: Modifying loop parameters, generating entry/exit per threaded task, and generating calls to parallel run-time routines for thread creation and synchronization.