This section contains simple examples of some common issues in vector programming.
The loop in the example of a vector copy operation does not vectorize because the compiler cannot prove that DEST(A(I)) and DEST(B(I)) are distinct.
Example 1: Unvectorizable Copy Due to Unproven Distinction |
---|
SUBROUTINE VEC_COPY(DEST,A,B,LEN) |
A 16-byte (Linux*) or 64-byte (Windows*) or greater data structure or array should be aligned so that the beginning of each structure or array element is aligned in a way that its base address is a multiple of 16 (Linux) or 32 (Windows).
The figure (below) shows the effect of a data cache unit (DCU) split due to misaligned data. The code loads the misaligned data across a 16-byte boundary, which results in an additional memory access causing a six- to twelve-cycle stall. You can avoid the stalls if you know that the data is aligned and you specify to assume alignment
Misaligned Data Crossing 16-Byte Boundary |
---|
After vectorization, the loop is executed as shown in figure below
Vector and Scalar Clean-up Iterations |
---|
Both the vector iterations A(1:4) = B(1:4); and A(5:8) = B(5:8); can be implemented with aligned moves if both the elements A(1) and B(1) are 16-byte aligned.
Caution
If you use the vectorizer with incorrect alignment options the compiler will generate code with unexpected behavior. Specifically, using aligned moves on unaligned data, will result in an illegal instruction exception.
The compiler has at its disposal several alignment strategies in case the alignment of data structures is not known at compile-time. A simple example is shown below (several other strategies are supported as well). If in the loop shown below the alignment of A is unknown, the compiler will generate a prelude loop that iterates until the array reference, that occurs the most, hits an aligned address. This makes the alignment properties of A known, and the vector loop is optimized accordingly. In this case, the vectorizer applies dynamic loop peeling, a specific Intel® Fortran feature.
Example 2: Original loop |
---|
SUBROUTINE DOIT(A) |
Example 3: Aligning Data |
---|
! The vectorizer applies dynamic loop peeling as follows: |