HP Fortran for OpenVMS
User Manual

5.5.9 Use of Variable Format Expressions

Variable format expressions (a Compaq Fortran 77 extension) are almost as flexible as run-time formatting, but they are more efficient because the compiler can eliminate run-time parsing of the I/O format. Only a small amount of processing and the actual data transfer are required during run time.

On the other hand, run-time formatting can impair performance significantly. For example, in the following statements, S₁ is more efficient than S₂ because the formatting is done once at compile time, not at run time:

S₁ WRITE (6,400) (A(I), I=1,N) 400 FORMAT (1X, <N> F5.2) . . . S₂ WRITE (CHFMT,500) '(1X,',N,'F5.2)' 500 FORMAT (A,I3,A) WRITE (6,FMT=CHFMT) (A(I), I=1,N)

5.6 Additional Source Code Guidelines for Run-Time Efficiency

Other source coding guidelines can be implemented to improve run-time performance.

The amount of improvement in run-time performance is related to the number of times a statement is executed. For example, improving an arithmetic expression executed within a loop many times has the potential to improve performance more than improving a similar expression executed once outside a loop.

5.6.1 Avoid Small or Large Integer and Logical Data Items (Alpha only)

If the target system is an Alpha processor predating EV56, avoid using integer or logical data items whose size is less than 32 bits. On those processors, the smallest unit of efficient single-instruction access is 32 bits, and accessing a 16-bit (or 8-bit) data type can result in a sequence of machine instructions to access the data.

5.6.2 Avoid Mixed Data Type Arithmetic Expressions

Avoid mixing integer and floating-point (REAL) data in the same computation. Expressing all numbers in a floating-point arithmetic expression (assignment statement) as floating-point values eliminates the need to convert data between fixed and floating-point formats. Expressing all numbers in an integer arithmetic expression as integer values also achieves this. This improves run-time performance.

For example, assuming that I and J are both INTEGER variables, expressing a constant number (2.) as an integer value (2) eliminates the need to convert the data:

Original Code: INTEGER I, J
I = J / 2.

Efficient Code: INTEGER I, J
I = J / 2

For applications with numerous floating-point operations, consider using the /ASSUME=NOACCURACY_SENSITIVE qualifier (see Section 5.8.8) if a small difference in the result is acceptable.

You can use different sizes of the same general data type in an expression with minimal or no effect on run-time performance. For example, using REAL, DOUBLE PRECISION, and COMPLEX floating-point numbers in the same floating-point arithmetic expression has minimal or no effect on run-time performance.

5.6.3 Use Efficient Data Types

In cases where more than one data type can be used for a variable, consider selecting the data types based on the following hierarchy, listed from most to least efficient:

Integer (See also Section 5.6.1)
Single-precision real, expressed explicitly as REAL, REAL (KIND=4), or REAL*4
Double-precision real, expressed explicitly as DOUBLE PRECISION, REAL (KIND=8), or REAL*8
Extended-precision real, expressed explicitly as REAL (KIND=16) or REAL*16

However, keep in mind that in an arithmetic expression, you should avoid mixing integer and floating-point (REAL) data (see Section 5.6.2).

5.6.4 Avoid Using Slow Arithmetic Operators

Before you modify source code to avoid slow arithmetic operators, be aware that optimizations convert many slow arithmetic operators to faster arithmetic operators. For example, the compiler optimizes the expression H=J**2 to be H=J*J.

Consider also whether replacing a slow arithmetic operator with a faster arithmetic operator will change the accuracy of the results or impact the maintainability (readability) of the source code.

Replacing slow arithmetic operators with faster ones should be reserved for critical code areas. The following hierarchy lists the HP Fortran arithmetic operators, from fastest to slowest:

Addition (+), subtraction (-), and floating-point multiplication (*)
Integer multiplication (*)
Division (/)
Exponentiation (**)

5.6.5 Avoid EQUIVALENCE Statement Use

Avoid using EQUIVALENCE statements. EQUIVALENCE statements can:

Force unaligned data or cause data to span natural boundaries.
Prevent certain optimizations, including:
- Global data analysis under certain conditions (see Section 5.7.3)
- Implied-DO loop collapsing when the control variable is contained in an EQUIVALENCE statement

5.6.6 Use Statement Functions and Internal Subprograms

Whenever the HP Fortran compiler has access to the use and definition of a subprogram during compilation, it might choose to inline the subprogram. Using statement functions and internal subprograms maximizes the number of subprogram references that will be inlined, especially when multiple source files are compiled together at optimization level /OPTIMIZE=LEVEL=4 or higher.

For more information, see Section 5.1.2.

5.6.7 Code DO Loops for Efficiency

Minimize the arithmetic operations and other operations in a DO loop whenever possible. Moving unnecessary operations outside the loop will improve performance (for example, when the intermediate nonvarying values within the loop are not needed).

For More Information:

On loop optimizations, see Section 5.8.2 and Section 5.8.4.
On HP Fortran statements, see the HP Fortran for OpenVMS Language Reference Manual.

5.7 Optimization Levels: /OPTIMIZE=LEVEL=n Qualifier

HP Fortran performs many optimizations by default. You do not have to recode your program to use them. However, understanding how optimizations work helps you remove any inhibitors to their successful function.

Generally, HP Fortran increases compile time in favor of decreasing run time. If an operation can be performed, eliminated, or simplified at compile time, HP Fortran does so, rather than have it done at run time. The time required to compile the program usually increases as more optimizations occur.

The program will likely execute faster when compiled at /OPTIMIZE=LEVEL=4, but will require more compilation time than if you compile the program at a lower level of optimization.

The size of the object file varies with the optimizations requested. Factors that can increase object file size include an increase of loop unrolling or procedure inlining.

Table 5-4 lists the levels of HP Fortran optimization with different /OPTIMIZE=LEVEL=n levels. For example, /OPTIMIZE=LEVEL=0 specifies no selectable optimizations (certain optimizations always occur); /OPTIMIZE=LEVEL=5 specifies all levels of optimizations including loop transformation and software pipelining.

Table 5-4 Types of Optimization Performed at Different Levels
/OPTIMIZE=LEVEL=n

Optimization Type n=0 n=1 n=2 n=3 n=4 n=5

Loop transformation <bullet symbol>

Software pipelining <bullet symbol> <bullet symbol>

Automatic inlining <bullet symbol> <bullet symbol>

Loop unrolling <bullet symbol> <bullet symbol> <bullet symbol>

Additional global optimizations <bullet symbol> <bullet symbol> <bullet symbol>

Global optimizations <bullet symbol> <bullet symbol> <bullet symbol> <bullet symbol>

Local (minimal) optimizations <bullet symbol> <bullet symbol> <bullet symbol> <bullet symbol> <bullet symbol>

**Table 5-4 Types of Optimization Performed at Different Levels**
	/OPTIMIZE=LEVEL=n
Optimization Type	n=0	n=1	n=2	n=3	n=4	n=5
Loop transformation						<bullet symbol>
Software pipelining					<bullet symbol>	<bullet symbol>
Automatic inlining					<bullet symbol>	<bullet symbol>
Loop unrolling				<bullet symbol>	<bullet symbol>	<bullet symbol>
Additional global optimizations				<bullet symbol>	<bullet symbol>	<bullet symbol>
Global optimizations			<bullet symbol>	<bullet symbol>	<bullet symbol>	<bullet symbol>
Local (minimal) optimizations		<bullet symbol>	<bullet symbol>	<bullet symbol>	<bullet symbol>	<bullet symbol>

The default is /OPTIMIZE=LEVEL=4.

In Table 5-4, the following terms are used to describe the levels of optimization (described in detail in Section 5.7.1 to Section 5.7.6):

Local (minimal) optimizations (/OPTIMIZE=LEVEL=1 or higher) occur within the source program unit and include recognition of common subexpressions and the expansion of multiplication and division.
Global optimizations (/OPTIMIZE=LEVEL=2 or higher) include such optimizations as data-flow analysis, code motion, strength reduction, split-lifetime analysis, and instruction scheduling.
Additional global optimizations (/OPTIMIZE=LEVEL=3 or higher) improve speed at the cost of extra code size. These optimizations include loop unrolling and code replication to eliminate branches.
Automatic inlining and Software pipelining (/OPTIMIZE=LEVEL=4 or higher) applies interprocedure analysis and inline expansion of small procedures, usually by using heuristics that limit extra code, and software pipelining.
Software pipelining applies instruction scheduling to certain innermost loops, allowing instructions within a loop to "wrap around" and execute in a different iteration of the loop. This can reduce the impact of long-latency operations, resulting in faster loop execution.
Software pipelining also enables the prefetching of data to reduce the impact of cache misses.
Loop transformation (/OPTIMIZE=LEVEL=5 or higher) includes a group of loop transformation optimizations.
The loop transformation optimizations apply to array references within loops and can apply to multiple nested loops. These optimizations can improve the performance of the memory system.

5.7.1 Optimizations Performed at All Optimization Levels

The following optimizations occur at any optimization level (0 through 5):

Space optimizations
Space optimizations decrease the size of the object or executing program by eliminating unnecessary use of memory, thereby improving speed of execution and system throughput. HP Fortran space optimizations are as follows:
- Constant Pooling
  Only one copy of a given constant value is ever allocated memory space. If that constant value is used in several places in the program, all references point to that value.
- Dead Code Elimination
  If operations will never execute or if data items will never be used, HP Fortran eliminates them. Dead code includes unreachable code and code that becomes unused as a result of other optimizations, such as value propagation.
Inlining arithmetic statement functions and intrinsic procedures
Regardless of the optimization level, HP Fortran inserts arithmetic statement functions directly into a program instead of calling them as functions. This permits other optimizations of the inlined code and eliminates several operations, such as calls and returns or stores and fetches of the actual arguments. For example:
SUM(A,B) = A+B . . . Y = 3.14 X = SUM(Y,3.0) ! With value propagation, becomes: X = 6.14
Most intrinsic procedures are automatically inlined.
Inlining of other subprograms, such as contained subprograms, occurs at optimization level 4.
Implied-DO loop collapsing
DO loop collapsing reduces a major overhead in I/O processing. Normally, each element in an I/O list generates a separate call to the HP Fortran RTL. The processing overhead of these calls can be most significant in implied-DO loops.
If HP Fortran can determine that the format will not change during program execution, it replaces the series of calls in up to seven nested implied-DO loops with a single call to an optimized RTL routine (see Section 5.5.8). The optimized RTL routine can transfer many elements in one operation.
HP Fortran collapses implied-DO loops in formatted and unformatted I/O operations, but it is more important with unformatted I/O, where the cost of transmitting the elements is a higher fraction of the total cost.
Array temporary elimination and FORALL statements
Certain array store operations are optimized. For example, to minimize the creation of array temporaries, HP Fortran can detect when no overlap occurs between the two sides of an array expression. This type of optimization occurs for some assignment statements in FORALL constructs.
Certain array operations are also candidates for loop unrolling optimizations (see Section 5.7.4.1).

5.7.2 Local (Minimal) Optimizations

To enable local optimizations, use /OPTIMIZE=LEVEL=1 or a higher optimization level (LEVEL=2, LEVEL=3, LEVEL=4, LEVEL=5).

To prevent local optimizations, specify /NOOPTIMIZE (/OPTIMIZE=LEVEL=0).

5.7.2.1 Common Subexpression Elimination

If the same subexpressions appear in more than one computation and the values do not change between computations, HP Fortran computes the result once and replaces the subexpressions with the result itself:

DIMENSION A(25,25), B(25,25) A(I,J) = B(I,J)

Without optimization, these statements can be compiled as follows:

t1 = ((J-1)*25+(I-1))*4 t2 = ((J-1)*25+(I-1))*4 A(t1) = B(t2)

Variables t1 and t2 represent equivalent expressions. HP Fortran eliminates this redundancy by producing the following:

t = ((J-1)*25+(I-1)*4 A(t) = B(t)

5.7.2.2 Integer Multiplication and Division Expansion

Expansion of multiplication and division refers to bit shifts that allow faster multiplication and division while producing the same result. For example, the integer expression (I*17) can be calculated as I with a 4-bit shift plus the original value of I. This can be expressed using the HP Fortran ISHFT intrinsic function:

J1 = I*17 J2 = ISHFT(I,4) + I ! equivalent expression for I*17

The optimizer uses machine code that, like the ISHFT intrinsic function, shifts bits to expand multiplication and division by literals.

5.7.2.3 Compile-Time Operations

HP Fortran does as many operations as possible at compile time rather than having them done at run time.

Constant Operations

HP Fortran can perform many operations on constants (including PARAMETER constants):

Constants preceded by a unary minus sign are negated.
Expressions involving +, --, *, or / operators are evaluated; for example:
PARAMETER (NN=27) I = 2*NN+J ! Becomes: I = 54 + J
Evaluation of some constant functions and operators is performed at compile time. This includes certain functions of constants, concatenation of string constants, and logical and relational operations involving constants.
Lower-ranked constants are converted to the data type of the higher-ranked operand:
REAL X, Y X = 10 * Y ! Becomes: X = 10.0 * Y
Array address calculations involving constant subscripts are simplified at compile time whenever possible:
INTEGER I(10,10) I(1,2) = I(4,5) ! Compiled as a direct load and store

Algebraic Reassociation Optimizations

HP Fortran delays operations to see whether they have no effect or can be transformed to have no effect. If they have no effect, these operations are removed. A typical example involves unary minus and .NOT. operations:

X = -Y * -Z ! Becomes: Y * Z

5.7.2.4 Value Propagation

HP Fortran tracks the values assigned to variables and constants, including those from DATA statements, and traces them to every place they are used. HP Fortran uses the value itself when it is more efficient to do so.

When compiling subprograms, HP Fortran analyzes the program to ensure that propagation is safe if the subroutine is called more than once.

Value propagation frequently leads to more value propagation. HP Fortran can eliminate run-time operations, comparisons and branches, and whole statements.

In the following example, constants are propagated, eliminating multiple operations from run time:

Original Code Optimized Code

PI = 3.14 .
.
.
PIOVER2 = PI/2 .
.
.
I = 100 .
.
.
IF (I.GT.1) GOTO 10
10 A(I) = 3.0*Q .
.
.
PIOVER2 = 1.57 .
.
.
I = 100 .
.
.
10 A(100) = 3.0*Q

Original Code	Optimized Code
`PI = 3.14` . . . `PIOVER2 = PI/2` . . . `I = 100` . . . `IF (I.GT.1) GOTO 10` `10 A(I) = 3.0*Q`	. . . `PIOVER2 = 1.57` . . . `I = 100` . . . `10 A(100) = 3.0*Q`

5.7.2.5 Dead Store Elimination

If a variable is assigned but never used, HP Fortran eliminates the entire assignment statement:

X = Y*Z . . .=Y*Z is eliminated. X = A(I,J)* PI

Some programs used for performance analysis often contain such unnecessary operations. When you try to measure the performance of such programs compiled with HP Fortran, these programs may show unrealistically good performance results. Realistic results are possible only with program units using their results in output statements.

Contents

Index

Original Code:	`INTEGER I, J` `I = J / 2.`
Efficient Code:	`INTEGER I, J` `I = J / 2`