Optimizing Applications Glossary

Term

Definition

A

 

alignment constraint

The proper boundary of the stack where data must be stored.

alternate loop transformation

An optimization in which the compiler generates a copy of a loop and executes the new loop depending on the boundary size.

B

 

branch count profiler

A tool that counts the number of times a program executes each branch statement. The utility also generates a database that shows how the program executed.

branch probability database

The database generated by the branch count profiler. The database contains the number of times each branch is executed.

C

 

cache hit

The situation when the information the processor wants is in the cache.

call site

A call site consists of the instructions immediately preceding a call instruction and the call instruction itself.

common subexpression elimination

An optimization in which the compiler detects and combines redundant computations.

conditionals

Any operation that takes place depending on whether or not a certain condition is true.

constant argument propagation

An optimization in which the compiler replaces the formal arguments of a routine with actual constant values. The compiler then propagates constant variables used as actual arguments.

constant branches

Conditionals that always take the same branch.

constant folding

An optimization in which the compiler, instead of storing the numbers and operators for computation when the program executes, evaluates the constant expression and uses the result.

copy propagation

An optimization in which the compiler eliminates unnecessary assignments by using the value assigned to a variable instead of using the variable itself.

D

 

dataflow

The movement of data through a system, from entry to destination.

dead-code elimination

An optimization in which the compiler eliminates any code that generates unused values or any code that will never be executed in the program.

denormal values

Computed floating-point values that have absolute values smaller than the smallest normalized floating-point number are called denormal values.

dynamic linking

The process in which a shared object is mapped into the virtual address space of your program at run time.

E

 

empty declaration

A semicolon and nothing before it.

F

 

frame pointer

A pointer that holds a base address for the current stack and is used to access the stack frame.

G

 

Gradual Underflow

Gradual underflow occurs when computed floating-point values have absolute values smaller than the smallest normalized floating-point number. Such floating-point values are called denormal values.

Gradual underflow can degrade the performance of an application.

H

 

HLO

High-level optimization

Hyper-Threading Technology

Hyper-Threading Technology enables the operation of multiple logical processors to share execution resources in each physical processor package. It increases system throughput when executing multithreaded applications or when multitasked workloads are running concurrently.

Hyper-Threading Technology enables you to use simultaneous multithreading on the IA-32 systems. This technology makes a single physical processor appear as two logical processors. Each logical processor can execute a software thread, allowing a maximum of two software threads to execute simultaneously on one physical processor. The two software threads execute simultaneously by the execution engine.

I

 

ILP32

int, long, and pointer types are 32-bit

IPO

Interprocedural optimization

in-line function expansion

An optimization in which the compiler replaces each function call with the function body expanded in place.

induction variable simplification

An optimization in which the compiler reduces the complexity of an array index calculation by using only additions.

instruction scheduling

An optimization in which the compiler reorders the generated machine instructions so that more than one can execute in parallel.

instruction sequencing

An optimization in which the compiler eliminates less efficient instructions and replaces them with instruction sequences that take advantage of a particular processor's features.

interprocedural optimization

An optimization that applies to the entire program except for library routines.

L

 

LP64

 long and pointer types are 64-bit

load balancing

The equal division of work among threads. If a load is balanced, it ensures processors are busy most, if not all, of the time. If a load is not balanced, some threads may finish significantly before others, leaving processor resources idle and wasting performance opportunities.

loop blocking

An optimization in which the compiler reorders the execution sequence of instructions so that the compiler can execute iterations from outer loops before completing all the iterations of the inner loop.

loop unrolling

An optimization in which the compiler duplicates the executed statements inside a loop to reduce the number of loop iterations.

loop-invariant code movement

An optimization in which the compiler detects multiple instances of a computation that does not change within a loop.

P

 

P64

pointer types are 64-bit

padding

The addition of bytes or words at the end of each data type in order to meet size and alignment constraints.

PGO

Profile-guided optimization

preloading

An optimization in which the compiler loads the vectors, one cache at a time, so that during the loop computation the number of external bus turnarounds is reduced.

privatization of scalars

Privatization of scalars is an operation of re-assigning the storage of scalars from the static or parent stack area to the local stack of a thread to enable parallelization. This operation requires a WRITE permission and is usually performed to remove a data dependency between concurrently executing threads.

profiling

A process in which detailed information is produced about the program's execution.

R

 

register variable detection

An optimization in which the compiler detects the variables that never need to be stored in memory and places them in register variables.

S

 

side effects

Results of the optimization process that might increase the code size and/or processing time.

static linking

The process in which a copy of the object file that contains a function used in your program is incorporated in your executable file at link time.

strength reduction

An optimization in which the compiler reduces the complexity of an array index calculation by using only additions.

strip mining

An optimization in which the compiler creates an additional level of nesting to enable inner loop computations on vectors that can be held in the cache. This optimization reduces the size of inner loops so that the amount of data required for the inner loop can fit the cache size.

SWP

Software Pipelining

T

 

token pasting

The process in which the compiler treats two tokens separated by a comment as one (for example, a/**/b become ab).

transformation

A rearrangement of code. In contrast, an optimization is a rearrangement of code where improved run-time performance is guaranteed.

U

 

unreachable code

Instructions that are never executed by the compiler.

unused code

Instructions that produce results that are not used in the program.

V

 

variable renaming

An optimization in which the compiler renames instances of a variable that refer to distinct entities.