SWP Reports

The Software Pipelining (SWP) report can provide details information about loops currently taking advantage of software pipelining available on Itanium®-based systems. Additionally, the report suggests reasons for the loops that are not being pipelined.

The following table lists the typical command syntax needed to generate a report for the Itanium® Compiler Code Generator (ECG) Software Pipeliner (SWP):

Platform

Description

Linux*

ifort -c -opt-report -opt-report-phase ecg_swp myfile.f

Windows*

ifort /c /Qopt-report /Qopt-report-phase ecg_swp myfile.f

where -c (Linux) or /c (Windows) tells the compiler to stop at generating the object code (no linking occurs), -opt-report (Linux) or /Qopt-report (Windows) invokes the report generator, and -opt-report-phaseecg_swp(Linux) or /Qopt-report-phaseecg_swp (Windows) indicates the phase (ecg) for which to generate the report.

Note

Linux* only: The space between the option and the phase is optional.

See Optimizer Report Generation for more information about options you can use to generate reports.

Typically, loops that software pipeline will have a line that indicates the compiler has scheduled the loop for SWP in the report. If the -O3 (Linux) or /O3 (Windows) option is specified, the SWP report merges the loop transformation summary performed by the loop optimizer.

The table lists common report messages and suggest corresponding actions to mitigate the problems. The most efficient use of your time is analyzing loops that did not SWP to determine how to achieve SWP. If the compiler reports that the Loop was not SWP because, see the following table for more information:

Message in Report

Suggested Action

acyclic global scheduler can achieve a better schedule: => loop not pipelined

Indicates that the most likely cause is memory aliasing issues.

  • Memory alias problems: See the memory aliasing section above (restrict, #pragma ivdep)

Indicates the application might be accessing memory in a non-Unit Stride fashion. Non-Unit Stride issues may be indicated by an artificially high recurrence II; If you know there is no recurrence relationship (a[i] = a[i-1] + b[i] for example) in the loop, then a high recurrence II (greater than 0) is a sign that you are accessing memory non-Unit Stride.

  • Rearranging code, perhaps a loop interchange, might help mitigate this problem.

Loop body has a function call

Indicates that inlining the function will help.

Not enough static registers

Indicates that you should distribute the loop by breaking it into two or more loops.

Not enough rotating registers

Indicates that the loop carried values use the rotating registers. Distribute the loop.

Loop too large

Indicates that you should distribute the loop.

Loop has a constant trip count < 4

Indicates that unrolling was insufficient. Attempt to fully unroll the loop. However, with such a small loop, it is doubtful that it is going to affect performance significantly.

Too much flow control

Indicates complex loop structure. Attempt to simplify the loop.

One interesting item to note is that the choice of index variable type can greatly impact performance. In some cases, using loop index variables of type short or unsigned int can even prevent software pipelining. If you are suffering performance problems in loops where the index variable is not an int, and if you cannot see any other obvious cause, you may want to try changing the loop index variable to type int.

When reading the reports, you must know the terminology and related concepts. The following table summarizes the terminology used in the SWP reports.

Term

Definition

II

Initiation Interval (II). The number of cycles between the start of one iteration and the next in the SWP. The presence of the term II in any SWP report indicates that SWP succeeded for the loop in question.

II can be used in a “back of the envelope” calculation to determine how many cycles your loop will take if you know the number of iterations. Total cycle time of the loop is approximately N * Scheduled II + number Stages (Where N is the number of iterations of the loop). This is an approximation because it does not take into account the ramp-up and ramp-down of the prolog and epilog of the SWP, and only considers the kernel of the SWP loop. As you modify your code, it is generally better to see scheduled II go down, though it is really N* (Scheduled II) + Number of stages in the software pipeline that is ultimately the figure of merit.

Resource II

Resource II implies what the II should be when considering the number of functional units available.

Recurrence II

Recurrence II indicates what the II should be when there is a recurrence relationship in the loop. A recurrence relationship is a particular kind of a data dependency called a flow dependency like a[i] = a[i-1] where a[i] cannot be computed until a[i-1] is known. If Recurrence II is non-zero and you know that there is no flow dependency in the code, then it is a euphemism for either Non-Unit Stride Access or for memory aliasing. Please see these sections under Helping the Compiler.

Minimum II

Minimum II is the theoretical minimum II that could be achieved.

Scheduled II

Scheduled II is what the compiler actually scheduled for the SWP.

number of stages

Indicates the number of stages. For example, in the report results below, the line Number of stages in the software pipeline = 3 indicates there were three stages of work, which will show, in assembly, to be a load, an FMA instruction and a store.

loop-carried memory dependence edges

The loop-carried memory dependence edges means the compiler avoided WAR (Write After Read) dependency. For example, in the report results below, line 12 is prominent: Store at line 12 --> Load at line 12. Loop-carried memory dependence edges can indicate problems with memory aliasing. See Helping the Compiler.

SWP report for loop at line 13 in multiply_d in file multiply_d.f:

Example

Resource II = 1

Recurrence II = 4

Minimum II = 4

Scheduled II = 4  Min. II = Sched. II => loop optimally scheduled

Percent of Resource II needed by arithmetic ops    = 100%

Percent of Resource II needed by memory ops = 100%

Percent of Resource II needed by floating point ops = 100%

 

Number of stages in the software pipeline = 3

Following are the loop-carried memory dependence edges:

Store at line 12 --> Load at line 12

Store at line 12 --> Load at line 12

One fast way to determine if specific loops are software pipelining is to use a text searching tool to search the report output searching for the string “Number of stages in the software pipeline”.  If this phrase is present, it means the associated loop successfully software pipelined. To determine which loop this phrase pertains to, be certain to include the few lines above this phrase which call out the file name, function name and source code line number that pinpoints the loop in question.

Another example report shows the results for a loop at line 5 in func1 in file main.f. The compiler generated information about the loadpair versioning and run-time Data Dependence checking for the loop, which had a scheduled II of 2.

Example

Loop at line 6: unrolled loadpair-ver-1 runtime-dependence-check-ver-1

Resource II   = 2

Recurrence II = 1

Minimum II    = 2

Scheduled II  = 2

Estimated GCS II   = 11

Percent of Resource II needed by arithmetic ops     = 100%

Percent of Resource II needed by memory ops         = 100%

Percent of Resource II needed by floating point ops =  50%

Number of stages in the software pipeline = 9