Improving I/O Performance

Improving overall I/O performance can minimize both device I/O and actual CPU time. The techniques listed in this topic can significantly improve performance in many applications.

An I/O flow problems limit the maximum speed of execution by being the slowest process in an executing program. In some programs, I/O is the bottleneck that prevents an improvement in run-time performance. The key to relieving I/O problems is to reduce the actual amount of CPU and I/O device time involved in I/O.

The problems can be caused by one or more of the following:

A dramatic reduction in CPU time without a corresponding improvement in I/O time.
Such coding practices as:
- Unnecessary formatting of data and other CPU-intensive processing
- Unnecessary transfers of intermediate results
- Inefficient transfers of small amounts of data
- Application requirements

Improved coding practices can minimize actual device I/O, as well as the actual CPU time.

Intel offers software solutions to system-wide problems like minimizing device I/O delays.

Use Unformatted Files Instead of Formatted Files

Use unformatted files whenever possible. Unformatted I/O of numeric data is more efficient and more precise than formatted I/O. Native unformatted data does not need to be modified when transferred and will take up less space on an external file.

Conversely, when writing data to formatted files, formatted data must be converted to character strings for output, less data can transfer in a single operation, and formatted data may lose precision if read back into binary form.

To write the array A(25,25) in the following statements, S1 is more efficient than S2:

Example
S1 WRITE (7) A S2 WRITE (7,100) A 100 FORMAT (25(' ',25F5.21))

Although formatted data files are more easily ported to other systems, Intel Fortran can convert unformatted data in several formats; see Little-endian-to-Big-endian Conversion (IA-32).

Write Whole Arrays or Strings

To eliminate unnecessary overhead, write whole arrays or strings at one time rather than individual elements at multiple times. Each item in an I/O list generates its own calling sequence. This processing overhead becomes most significant in implied-DO loops. When accessing whole arrays, use the array name (Fortran array syntax) instead of using implied-DO loops.

Write Array Data in the Natural Storage Order

Use the natural ascending storage order whenever possible. This is column-major order, with the leftmost subscript varying fastest and striding by 1. (See Accessing Arrays Efficiently.) If a program must read or write data in any other order, efficient block moves are inhibited.

If the whole array is not being written, natural storage order is the best order possible.

If you must use an unnatural storage order, in certain cases it might be more efficient to transfer the data to memory and reorder the data before performing the I/O operation.

Use Memory for Intermediate Results

Performance can improve by storing intermediate results in memory rather than storing them in a file on a peripheral device. One situation that may not benefit from using intermediate storage is when there is a disproportionately large amount of data in relation to physical memory on your system. Excessive page faults can dramatically impede virtual memory performance.

Linux* Only: If you are primarily concerned with the CPU performance of the system, consider using a memory file system (mfs) virtual disk to hold any files your code reads or writes.

Enable Implied-DO Loop Collapsing

DO loop collapsing reduces a major overhead in I/O processing. Normally, each element in an I/O list generates a separate call to the Intel Fortran run-time library (RTL). The processing overhead of these calls can be most significant in implied-DO loops.

Intel Fortran reduces the number of calls in implied-DO loops by replacing up to seven nested implied-DO loops with a single call to an optimized run-time library I/O routine. The routine can transmit many I/O elements at once.

Loop collapsing can occur in formatted and unformatted I/O, but only if certain conditions are met:

The control variable must be an integer. The control variable cannot be a dummy argument or contained in an EQUIVALENCE or VOLATILE statement. Intel Fortran must be able to determine that the control variable does not change unexpectedly at run time.
The format must not contain a variable format expression.

For information on the VOLATILE attribute and statement, see the Intel® Fortran Language Reference.

For loop optimizations, see Loop Transformations, Loop Unrolling, and Optimization Levels.

Use of Variable Format Expressions

Variable format expressions (an Intel Fortran extension) are almost as flexible as run-time formatting, but they are more efficient because the compiler can eliminate run-time parsing of the I/O format. Only a small amount of processing and the actual data transfer are required during run time.

On the other hand, run-time formatting can impair performance significantly. For example, in the following statements, S1 is more efficient than S2 because the formatting is done once at compile time, not at run time:

Example
S1 WRITE (6,400) (A(I), I=1,N) 400 FORMAT (1X, <N> F5.2) ... S2 WRITE (CHFMT,500) '(1X,',N,'F5.2)' 500 FORMAT (A,I3,A) WRITE (6,FMT=CHFMT) (A(I), I=1,N)

Efficient Use of Record Buffers and Disk I/O

Records being read or written are transferred between the user's program buffers and one or more disk block I/O buffers, which are established when the file is opened by the Intel Fortran RTL. Unless very large records are being read or written, multiple logical records can reside in the disk block I/O buffer when it is written to disk or read from disk, minimizing physical disk I/O.

You can specify the size of the disk block physical I/O buffer by using the OPEN statement BLOCKSIZE specifier; the default size can be obtained from fstat(2). If you omit the BLOCKSIZE specifier in the OPEN statement, it is set for optimal I/O use with the type of device the file resides on (with the exception of network access).

The OPEN statement BUFFERCOUNT specifier specifies the number of I/O buffers. The default for BUFFERCOUNT is 1. Any experiments to improve I/O performance should increase the BUFFERCOUNT value and not the BLOCKSIZE value, to increase the amount of data read by each disk I/O.

If the OPEN statement has BLOCKSIZE and BUFFERCOUNT specifiers, then the internal buffer size in bytes is the product of these specifiers. If the open statement does not have these specifiers, then the default internal buffer size is 8192 bytes. This internal buffer will grow to hold the largest single record, but will never shrink.

The default for the Fortran run-time system is to use unbuffered disk writes. That is, by default, records are written to disk immediately as each record is written instead of accumulating in the buffer to be written to disk later.

To enable buffered writes (that is, to allow the disk device to fill the internal buffer before the buffer is written to disk), use one of the following:

The OPEN statement BUFFERED specifier
The -assume buffered_io (Linux*) or /assume:buffered_io (Windows*) option of the ASSUME command
The FORT_BUFFERED run-time environment variable

The OPEN statement BUFFERED specifier takes precedence over the -assume buffered_io (Linux) or /assume:buffered_io (Windows) option. If neither one is set (which is the default), the FORT_BUFFERED environment variable is tested at run time.

The OPEN statement BUFFERED specifier applies to a specific logical unit. In contrast, the -assume nobuffered_io (Linux) or /assume:nobuffered_io (Windows) option and the FORT_BUFFERED environment variable apply to all Fortran units.

Using buffered writes usually makes disk I/O more efficient by writing larger blocks of data to the disk less often. However, a system failure when using buffered writes can cause records to be lost, since they might not yet have been written to disk. (Such records would have been written to disk with the default unbuffered writes.)

When performing I/O across a network, be aware that the size of the block of network data sent across the network can impact application efficiency. When reading network data, follow the same advice for efficient disk reads, by increasing the BUFFERCOUNT. When writing data through the network, several items should be considered:

Unless the application requires that records be written using unbuffered writes, enable buffered writes by a method described above.
Especially with large files, increasing the BLOCKSIZE value increases the size of the block sent on the network and how often network data blocks get sent.
Time the application when using different BLOCKSIZE values under similar conditions to find the optimal network block size.

Completion of a WRITE statement, even when not buffered, does not guarantee that the data has been written to disk; the operating system may delay the actual disk write. To ensure that the data has been written to disk, you can call the FLUSH routine. Because this can cause a significant slowdown of the application, FLUSH should be used only when absolutely needed.

Linux* Only: When writing records, be aware that I/O records are written to unified buffer cache (UBC) system buffers. To request that I/O records be written from program buffers to the UBC system buffers, use the FLUSH library routine. Be aware that calling FLUSH also discards read-ahead data in user buffer. For more information, see FLUSH in the Intel® Fortran Libraries Reference.

Specify RECL

The sum of the record length (RECL specifier in an OPEN statement) and its overhead is a multiple or divisor of the blocksize, which is device-specific. For example, if the BLOCKSIZE is 8192 then RECL might be 24576 (a multiple of 3) or 1024 (a divisor of 8).

The RECL value should fill blocks as close to capacity as possible (but not over capacity). Such values allow efficient moves, with each operation moving as much data as possible; the least amount of space in the block is wasted. Avoid using values larger than the block capacity, because they create very inefficient moves for the excess data only slightly filling a block (allocating extra memory for the buffer and writing partial blocks are inefficient).

The RECL value unit for formatted files is always 1-byte units. For unformatted files, the RECL unit is 4-byte units, unless you specify the -assume byterecl (Linux) or /assume:byterecl (Windows) option for the ASSUME specifier to request 1-byte units.

Use the Optimal Record Type

Unless a certain record type is needed for portability reasons, choose the most efficient type, as follows:

For sequential files of a consistent record size, the fixed-length record type gives the best performance.
For sequential unformatted files when records are not fixed in size, the variable-length record type gives the best performance, particularly for BACKSPACE operations.
For sequential formatted files when records are not fixed in size, the Stream_LF record type gives the best performance.

Reading from a Redirected Standard Input File

Due to certain precautions that the Fortran run-time system takes to ensure the integrity of standard input, reads can be very slow when standard input is redirected from a file. For example, when you use a command such as myprogram.exe < myinput.data> , the data is read using the READ(*) or READ(5) statement, and performance is degraded. To avoid this problem, do one of the following:

Explicitly open the file using the OPEN statement. For example:
OPEN(5, STATUS='OLD', FILE='myinput.dat')

Use an environment variable to specify the input file.

To take advantage of these methods, be sure your program does not rely on sharing the standard input file.

For more information on Intel Fortran data files and I/O, see "Files, Devices, and I/O" in the User's Guide; on OPEN statement specifiers and defaults, see "Open Statement" in the Intel® Fortran Language Reference.