Prefetching Support

Data prefetching refers to loading data from a relatively slow memory into a relatively fast cache before the data is needed by the application. Data prefetch behavior depends on the architecture:

Issuing prefetches improves performance in most cases; there are cases where issuing prefetch instructions might slow application performance. Experiment with prefetching; it might be helpful to specifically turn prefetching on or off with a compiler option while leaving all other optimizations unaffected to isolate a suspected prefetch performance issue. See Prefetching with Options for information on using compiler options for prefetching data.

There are two primary methods of issuing prefetch instructions. One is by using compiler directives and the other is by using compiler intrinsics.

Directives

PREFETCH and NOPREFETCH

The PREFETCH and NOPREFETCH directives are supported by ItaniumŪ processors only. These directives assert that the data prefetches be generated or not generated for some memory references. This affects the heuristics used in the compiler.

If loop includes expression A(j), placing  PREFETCH A in front of the loop, instructs the compiler to insert prefetches for A(j + d) within the loop. d is the number of iterations ahead to prefetch the data and is determined by the compiler. This directive is supported only when option -O3 (Linux*) or /O3 (Windows*) is on. These directives are also supported when you specify options -O1 and -O2 (Linux) or /O1 and /O2 (Windows). Remember that -O2 or /O2 is the default optimization level.

Example

!DEC$ NOPREFETCH c

!DEC$ PREFETCH a

  do i = 1, m

    b(i) = a(c(i)) + 1

  enddo

The following example is for ItaniumŪ-based systems only:

Example

do j=1,lastrow-firstrow+1
i = rowstr(j)
iresidue = mod( rowstr(j+1)-i, 8 )
sum = 0.d0
CDEC$ NOPREFETCH a,p,colidx
do k=i,i+iresidue-1
sum = sum +  a(k)*p(colidx(k))
enddo
CDEC$ NOPREFETCH colidx
CDEC$ PREFETCH a:1:40
CDEC$ PREFETCH p:1:20
do k=i+iresidue, rowstr(j+1)-8, 8
sum = sum + a(k  )*p(colidx(k  ))
&      + a(k+1)*p(colidx(k+1)) + a(k+2)*p(colidx(k+2))
&      + a(k+3)*p(colidx(k+3)) + a(k+4)*p(colidx(k+4))
&      + a(k+5)*p(colidx(k+5)) + a(k+6)*p(colidx(k+6))
&      + a(k+7)*p(colidx(k+7))
enddo
q(j) = sum
enddo

For more details on these directives, see "Directive Enhanced Compilation", section "General Directives", in the IntelŪ Fortran Language Reference.

Intrinsics

Before inserting compiler intrinsics, experiment with all other supported compiler options and directives. Compiler intrinsics are less portable and less flexible than either a compiler option or compiler directives.

Directives enable compiler optimizations while intrinsics perform optimizations. As a result, programs with directives are more portable, because the compiler can adapt to different processors, while the programs with intrinsics may have to be rewritten/ported for different processors. This is because intrinsics are closer to assembly programming.

The compiler supports an intrinsic subroutine mm_prefetch. In contrast the way the prefetch directive enables a data prefetch from memory, the subroutine mm_prefetch prefetches data from the specified address on one memory cache line.  The mm_prefetch subroutine is described in the IntelŪ Fortran Language Reference.