The profrun utility is a tool for collecting data from the Performance Monitoring Unit (PMU) hardware for a CPU.
This utility uses the Intel® VTune™ Performance Analyzer Driver to sample events. Therefore, you must have the VTune™ analyzer installed to use the profrun utility. The requirements differ depending on platform:
Platform |
Requirements |
---|---|
Linux* |
|
Windows* |
|
The utility, in coordination with the analyzer driver, collects samples of events monitored by the PMU and creates a hardware profiling information file (.hpi). The hardware profiling data contained in the file can be used by the Intel® compiler to further enhance optimizations for some programs.
During the initial target program analysis phase, VTune™ analyzer driver creates a file that contains event data for all system-wide processes. By default, the file is named pgopti.tb5. Eventually, the profrun utility will coalesce the data for the target executable into a significantly smaller pgopti.hpi file, and then deletes the .tb5 file.
Note
Your system must have sufficient hard disk space to hold the .tb5 file temporarily. The file size can range widely, plan for as much as 400 MB.
The VTune™ analyzer driver can be used by only one process at a time. The profrun utility returns an error if the driver is already in use by another process. By default, profrun waits up to 10 minutes for the driver to become available before attempting to access it again.
Compile your target application using the -prof-gen-sampling (Linux*) or /Qprof-gen-sampling (Windows*) option. The following examples illustrate possible combinations:
Platform |
Command Examples |
---|---|
Linux |
ifort -oMyApp -O2 -prof-gen-sampling source1.f source2.f |
Windows |
ifort /FeMyApp.exe /02 /Qprof-gen-sampling source1.f source2.f |
Run the resulting executable by entering a command similar to the following:
Platform |
Command Examples |
---|---|
Linux |
profrun -dcache MyApp |
Windows |
profrun -dcache MyApp.exe |
This step uses the VTune™ analyzer driver to produce the necessary .hpi file.
Compile your target application again; however, during this compilation use the -prof-use (Linux) or /Qprof-use (Windows) option. The following examples illustrate possible, valid combinations:
Platform |
Command Examples |
---|---|
Linux |
ifort -oMyApp -02 -prof-use source1.f source2.f |
Windows |
ifort /FeMyApp.exe /02 /Qprof-use source1.f source2.f |
The -prof-use (Linux) or /Qprof-use (Windows) option instructs the compiler to read the .hpi file and optimize the application using the collected branch sample data.
The profrun utility uses the following syntax:
Syntax |
---|
profrun -command [argument...] application |
where command is one or more of the commands listed in the following table, argument is one or more of the extended switches, and application is the command line for running the application, which is usually the application name.
Note
Windows* systems: Unlike the compiler options, which are preceded by forward slash ("/"), the utility options are preceded by a hyphen ("-").
The following table summarizes the available profrun utility commands, list defaults where applicable, and provides a brief description of each.
Command |
Default |
Description |
---|---|---|
-help |
|
Lists the supported tool options. |
[sav] |
10007 |
An optional argument supported by the -branch, -dcache, -icache or -event commands. This integer value specifies the sample-after value for used for collecting samples. The default value is 10007, which is a moderately-sized prime number. If another value is not specified for sav when using any of the supported commands, the default value is used. Use a prime number for best results. When changing the value, keep the following guideline in mind:
|
-branch[:sav] |
10007 |
Collect branch samples. A sample is taken after every interval, as defined by the sav argument. Use this command to gather information that guides the compiler while doing hot/cold block layout, predicting which direction conditional branches directions, and deciding where it is most profitable to inline functions. Gathering information using this command is similar to using the -prof-gen (Linux) or /Qprof-gen (Windows) option to instrument your program, but using this command is much less intrusive. |
-dcache[:sav] |
10007 |
Collect samples of data cache misses. A sample is taken after every interval, as defined by the sav argument. Use this command to gather information to guide the compiler in placing prefetch operations and performing data layout. |
-icache[:sav] |
10007 |
Collect samples of misses in the Instruction cache. A sample is taken after every interval, as defined by the sav argument. Use this command to gather information to guide the compiler in placing prefetch operations and performing data layout. |
-event:eventname[:sav] |
10007 |
Collect information about valid VTune™ analyzer events; eventname specifies a specific event name. Use this command when -branch, -dcache, or -icache do not apply. Some event names contain embedded spaces. In the case where you can use a period instead of a space. The utility will change periods to spaces before passing the event to VTune™ analyzer. Refer to Intel® VTune™ Performance Analyzer documentation for more information on valid events. |
-tb5:file |
pgopti.tb5 |
Specifies the name of the .tb5 file name generated by the VTune™ analyzer driver while it profiles the application. By default, the file resides in the current directory, The specified file will be deleted when profrun completes unless -tb5only is also specified. You might consider overriding this behavior and place the .tb5 file on a disk with more available space. |
-tb5only |
|
Produces only the .tb5 file. If this command is not specified the utility will reduce the data into a single .hpi file and delete the .tb5 file. |
-wait[:time] |
600 seconds (10 minutes) |
Forces the utility to wait for the specified time before attempting to access to the VTune™ analyzer driver. This option is most useful in cases where you anticipate the driver will be busy. Disable the command by specifying a value of 0 (zero). |
-hpi:file |
pgopti.hpi |
Specifies name of the .hpi file containing the profile information for the application. The file resides in the current directory. |
-executable:file |
|
Specifies the name of the executable being profiled. By default, this is the first token of the command. |
-bufsize:size |
65536 KB |
Specifies the buffer size used in kilobytes (KB). |
-sampint:interval |
|
Specifies sampling interval in Milliseconds (ms). This command should rarely be needed since all sampling is event-based sampling. |
-- |
|
Stop parsing options for the tool. Use this if the command name starts with a hyphen. |