Vectorization Examples

This section contains simple examples of some common issues in vector programming.

Argument Aliasing: A Vector Copy

The loop in the example of a vector copy operation does not vectorize because the compiler cannot prove that DEST(A(I)) and DEST(B(I)) are distinct.

Example 1: Unvectorizable Copy Due to Unproven Distinction

SUBROUTINE VEC_COPY(DEST,A,B,LEN)
DIMENSION DEST(*)
INTEGER A(*), B(*)
INTEGER LEN, I
DO I=1,LEN
DEST(A(I)) = DEST(B(I))
END DO
RETURN
END

Data Alignment

A 16-byte (Linux*) or 64-byte (Windows*) or greater data structure or array should be aligned so that the beginning of each structure or array element is aligned in a way that its base address is a multiple of 16 (Linux) or 32 (Windows).

The figure (below) shows the effect of a data cache unit (DCU) split due to misaligned data. The code loads the misaligned data across a 16-byte boundary, which results in an additional memory access causing a six- to twelve-cycle stall. You can avoid the stalls if you know that the data is aligned and you specify to assume alignment

Misaligned Data Crossing 16-Byte Boundary

After vectorization, the loop is executed as shown in figure below

Vector and Scalar Clean-up Iterations

Both the vector iterations A(1:4) = B(1:4); and A(5:8) = B(5:8); can be implemented with aligned moves if both the elements A(1) and B(1) are 16-byte aligned.

Caution

If you use the vectorizer with incorrect alignment options the compiler will generate code with unexpected behavior. Specifically, using aligned moves on unaligned data, will result in an illegal instruction exception.

Alignment Strategy

The compiler has at its disposal several alignment strategies in case the alignment of data structures is not known at compile-time. A simple example is shown below (several other strategies are supported as well). If in the loop shown below the alignment of A is unknown, the compiler will generate a prelude loop that iterates until the array reference, that occurs the most, hits an aligned address. This makes the alignment properties of A known, and the vector loop is optimized accordingly. In this case, the vectorizer applies dynamic loop peeling, a specific Intel® Fortran feature.

Examples of Data Alignment

Example 2: Original loop

SUBROUTINE DOIT(A)
REAL A(100)    ! alignment of argument A is unknown
DO I = 1, 100
A(I) = A(I) + 1.0
ENDDO
END SUBROUTINE

 

Example 3: Aligning Data

! The vectorizer applies dynamic loop peeling as follows:
SUBROUTINE DOIT(A)
REAL A(100)
! let P be (A%16)where A is address of A(1)
IF (P .NE. 0) THEN
P = (16 - P)/4   ! determine run-time peeling factor
DO I = 1, P
A(I) = A(I) + 1.0
ENDDO
ENDIF
! Now this loop starts at a 16-byte boundary, and will be
! vectorized accordingly
DO I = P + 1, 100
A(I) = A(I) + 1.0
ENDDO
END SUBROUTINE