HP OpenVMS Systems Documentation

HP Pascal for OpenVMS
User Manual

Chapter 3
Program Correctness, Optimization, and Efficiency

This chapter discusses the following topics:

The objective of optimization is to produce source and object programs that achieve the greatest amount of processing with the least amount of time and memory. Realizing this objective requires programs that are carefully designed and written, and compilation techniques, such as those used byHP Pascal, that take advantage of the operating system and machine architecture environment. (The benefits of portable code and program efficiency depend on the requirements of your application.)

3.1 Compiler Optimizations

By default, programs compiled with the HP Pascal compiler undergo optimization. An optimizing compiler automatically attempts to remove repetitious instructions and redundant computations by making assumptions about the values of certain variables. This, in turn, reduces the size of the object code, allowing a program written in a high-level language to execute at a speed comparable to that of a well-written assembly language program. Optimization can increase the amount of time required to compile a program, but the result is a program that may execute faster and more efficiently than a nonoptimized program.

The language elements you use in the source program directly affect the compiler's ability to optimize the object program. Therefore, you should be aware of the ways in which you can assist compiler optimization. In addition, this awareness often makes it easier for you to track down the source of a problem when your program exhibits unexpected behavior.

The compiler performs the following optimizations:

Compile-time evaluation of constant expressions
Elimination of some common subexpressions
Partial elimination of unreachable code
Code hoisting from structured statements, including the removal of invariant computations from loops
Inline code expansion for many predeclared functions
Inline code expansion for user-declared routines
Rearrangement of unary minus and NOT operations to eliminate unary negation and complement operations
Partial evaluation of logical expressions
Propagation of compile-time known values
Strength reduction (OpenVMS I64 and OpenVMS Alpha systems)
Split lifetime analysis (OpenVMS I64 and OpenVMS Alpha systems)
Code scheduling (OpenVMS I64 and OpenVMS Alpha systems)
Loop unrolling (OpenVMS I64 and OpenVMS Alpha systems)
Software pipelining (OpenVMS I64 and OpenVMS Alpha systems)

These optimizations are described in the following sections. In addition, the compiler performs the following optimizations, which can be detected only by a careful examination of the machine code produced by the compiler:

Global assignment of variables to registers
If possible, reduce the number of memory references needed by assigning frequently referenced variables to registers.
Reordering the evaluation of expressions
This minimizes the number of temporary values required.
Peephole optimization of instruction sequences
The compiler examines code a few instructions at a time to find operations that can be replaced by shorter and faster equivalent operations.

For More Information:

On HP Pascal language elements (HP Pascal for OpenVMS Language Reference Manual)

3.1.1 Compile-Time Evaluation of Constants

The compiler performs the following computations on constant expressions at compile time:

Negation of constants
The value of a constant preceded by unary minus signs is negated at compile time. For example:
x := -10.0;
Type conversion of constants
The value of a lower-ranked constant is converted to its equivalent in the data type of the higher-ranked operand at compile time. For example:
x := 10 * y;
If x and y are both real numbers, then this operation is compiled as follows:
x := 10.0 * y;
Arithmetic on integer and real constants
An expression that involves +, --, *, or / operators is evaluated at compile time. For example:
CONST nn = 27; {In the executable section:} i := 2 * nn + j;
This is compiled as follows:
i := 54 + j;
Array address calculations involving constant indexes
These are simplified at compile time whenever possible. For example:
VAR i : ARRAY[1..10, 1..10] OF INTEGER; {In the executable section:} i[1,2] := i[4,5];
Evaluation of constant functions and operators
Arithmetic, ordinal, transfer, unsigned, allocation size, CARD, EXPO, and ODD functions involving constants, concatenation of string constants, and logical and relational operations on constants, are evaluated at compile time.

For More Information:

On the complete list of compile-time operations and routines (HP Pascal for OpenVMS Language Reference Manual)

3.1.2 Elimination of Common Subexpressions

The same subexpression often appears in more than one computation within a program. For example:

a := b * c + e * f;

h := a + g - b * c;

IF ((b * c) - h) <> 0 THEN ...

In this code sequence, the subexpression b * c appears three times. If the values of the operands b and c do not change between computations, the value b * c can be computed once and the result can be used in place of the subexpression. The previous sequence is compiled as follows:

t := b * c;

a := t + e * f;

h := a + g - t;

IF ((t) - h) <> 0 THEN ...

Two computations of b * c have been eliminated. In this case, you could have modified the source program itself for greater program optimization.

The following example shows a more significant application of this kind of compiler optimization, in which you could not reasonably modify the source code to achieve the same effect:

VAR
   a, b : ARRAY[1..25, 1..25] OF REAL;
{In the executable section:}
a[i,j] := b[i,j];

Without optimization, this source program would be compiled as follows:

t1 := (j - 1) * 25 + i;
t2 := (j - 1) * 25 + i;
a[t1] := b[t2];

Variables t1 and t2 represent equivalent expressions. The compiler eliminates this redundancy by producing the following optimization:

t = (j - 1) * 25 + i;
a[t] := b[t];

3.1.3 Elimination of Unreachable Code

The compiler can determine which lines of code, if any, are never executed and eliminates that code from the object module being produced. For example, consider the following lines from a program:

CONST
   Debug_Switch = FALSE;
{In the executable section:}
IF  Debug_Switch  THEN  WRITELN( 'Error found here' );

The IF statement is designed to write an error message if the value of the symbolic constant Debug_Switch is TRUE. Suppose that the error has been removed, and you change the definition of Debug_Switch to give it the value FALSE. When the program is recompiled, the compiler can determine that the THEN clause will never be executed because the IF condition is always FALSE; no machine code is generated for this clause. You need not remove the IF statement from the source program.

Code that is otherwise unreachable, but contains one or more labels, is not eliminated unless the GOTO statement and the label itself are located in the same block.

3.1.4 Code Hoisting from Structured Statements

The compiler can improve the execution speed and size of programs by removing invariant computations from structured statements. For example:

FOR j := 1 TO i + 23 DO
   BEGIN
   IF Selector THEN a[i + 23, j - 14] := 0
   ELSE b[i + 23, j - 14] := 1;
   END;

If the compiler detected this IF statement, it would recognize that, regardless of the Boolean value of Selector, a value is stored in the array component denoted by [i + 23, j -- 14]. The compiler would change the sequence to the following:

t := i + 23;
FOR j := 1 TO t DO
   BEGIN
   u := j - 14;
   IF Selector THEN a[t,u] := 0
   ELSE b[t,u] := 1;
   END;

This removes the calculation of j -- 14 from the IF statement, and the calculation of i + 23 from both the IF statement and the loop.

3.1.5 Inline Code Expansion for Predeclared Functions

The compiler can often replace calls to predeclared routines with the actual algorithm for performing the calculation. For example:

Square := SQR( a );

The compiler replaces this function call with the following, and generates machine code based on the expanded call:

Square := a * a;

The program executes faster because the algorithm for the SQR function has already been included in the machine code.

3.1.6 Inline Code Expansion for User-Declared Routines

Inline code expansion for user-declared routines performs in the same manner as inline code expansion for predeclared functions: the compiler can often replace calls to user-declared routines with an inline expansion of the routine's executable code. Inline code expansion is useful on routines that are called only a few times. The overhead of an actual procedure call is avoided, which increases program execution. The size of the program, however, may increase due to the routine's expansion.

To determine whether or not it is desirable to inline expand a routine, compilers use a complex algorithm. Section 3.1.7 describes the algorithm for HP Pascal on OpenVMS VAX systems; HP Pascal on OpenVMS I64 and OpenVMS Alpha systems uses a similar algorithm to make the determination.

3.1.7 Testing for Inline Expansion on OpenVMS VAX Systems

The first part of the algorithm performs tests for cases that always prohibit the routine from being inlined. A failure of one of these tests can be thought of as a hard failure. These hard failure tests verify that the following are false; if any one of these tests is true, the routine is not inlined:

The called routine is an external or inherited routine.
Either the calling routine or the called routine does not have inlining optimization enabled. Note that optimization is enabled by default.
The called routine establishes an exception handler, or is used as an exception handler.
The called function result is a structured result type.
The calling routine and the called routine do not have the same checking options enabled.
The calling routine and the called routine do not use the same program section.
The called routine declares a routine parameter or is itself a routine parameter.
The called routine's parameter list contains a LIST or TRUNCATE parameter, a read-only VAR parameter, or a conformant parameter.
The called routine declares local file variables or contains any nonlocal GOTO operations.
The called routine references automatic variables in an enclosing scope.
The called routine uses or declares nonstatic types.

The second part of the algorithm performs tests to determine how desirable it is to inline the routine at a particular call point. A failure to one of these tests can be thought of as a soft failure. These tests check for the number of formal parameters, number of local variables, whether the called routine is directly recursive, the number of direct calls to the routine, and the size of both the calling and the called routine.

If an explicit [OPTIMIZE(INLINE)] attribute is specified on the routine declaration, the hard failure tests are still performed; however, the soft failure tests are not. So if the routine passes the hard failure tests, that routine is inlined at all call points. Specifying this attribute provides you with more power in deciding which routines should be inlined.

Note

There is no stack frame for an inline user-declared routine and no debugger symbol table information for the expanded routine. Debugging the execution of an inline routine is difficult and is not recommended.

3.1.8 Operation Rearrangement

The compiler can produce more efficient machine code by rearranging operations to avoid having to negate and then calculate the complement of the values involved. For example:

(-c) * (b - a)

If a program includes this operation, the compiler rearranges the operation to read as follows:

c * (a - b)

These two operations produce the same result, but because the compiler has eliminated negation or complement operations, the machine code produced is more efficient.

3.1.9 Partial Evaluation of Logical Expressions

The Pascal language does not specify the order in which the components of an expression must be evaluated. If the value of an expression can be determined by partial evaluation, then some subexpressions may not be evaluated at all. This situation occurs most frequently in the evaluation of logical expressions. For example:

WHILE ( i < 10 ) AND ( a[i] <> 0 ) DO
   BEGIN
   a[i] := a[i] + 1;
   i := i + 1;
   END;

In this WHILE statement, the order in which the two subexpressions ( i < 10 ) and ( a[i] <> 0 ) are evaluated is not specified; in fact, the compiler may evaluate them simultaneously. Regardless of which subexpression is evaluated first, if its value is FALSE the condition being tested in the WHILE statement is also FALSE. The other subexpression need not be evaluated at all. In this case, the body of the loop is never executed.

To force the compiler to evaluate expressions in left-to-right order with short circuiting, you can use the AND_THEN operator, as shown in the following example:

WHILE ( i < 10 ) AND_THEN ( a[i] <> 0 ) DO
   BEGIN
   a[i] := a[i] + 1;
   i := i + 1;
   END;

3.1.10 Value Propagation

The compiler keeps track of the values assigned to variables and traces the values to most of the places that they are used. If it is more efficient to use the value rather than a reference to the variable, the compiler makes this change. This optimization is called value propagation. Value propagation causes the object code to be smaller, and may also improve run-time speed.

Value propagation performs the following actions:

It allows run-time operations to be replaced with compile-time operations. For example:
Pi := 3.14; Pi_Over_2 := Pi/2;
In a program that includes these assignments, the compiler recognizes the fact that Pi's value did not change between the time of Pi's assignment and its use. So, the compiler would use Pi's value instead of a reference to Pi and perform the division at compile time. The compiler treats the assignments as if they were as follows:
Pi := 3.14; Pi_Over_2 := 1.57;
This process is repeated, allowing for further constant propagation to occur.
It allows comparisons and branches to be avoided at run time. For example:
x := 3; IF x <> 3 THEN y := 30 ELSE y := 20;
In a program that includes these operations, the compiler recognizes that the value of x is 3 and the THEN statement cannot be reached. The compiler will generate code as if the statements were written as follows:
x := 3; y := 20;

3.1.11 Strength Reduction (OpenVMS I64 and OpenVMS Alpha systems)

Strength reduction speeds computations by replacing a multiply operation with a more efficient add instruction when computing array addresses each time around a loop.

3.1.12 Split Lifetime Analysis (OpenVMS I64 and OpenVMS Alpha systems)

Split lifetime analysis improves register usage by determining if the lifetime of a variable can be broken into multiple, independent sections. If so, the variable may be stored in different registers for each section. The registers can then be reused for other purposes between sections. Therefore, there may be times when the value of the variable does not exist anywhere in the registers. For example:

v:= 3.0 *q;
   .
   .
   .
x:= SIN(y)*v:
   .
   .
   .
v:= PI*x:
   .
   .
   .
y:= COS(y)*v;

This example shows that the variable v has two disjoint usage sections. The value of v in the first section does not affect the value of v in the second section. The compiler may use different registers for each section.

3.1.13 Code Scheduling (OpenVMS I64 and OpenVMS Alpha systems)

Code scheduling is a technique for reordering machine instructions to maximize the amount of overlap of the multiple execution units inside the CPU. The exact scheduling algorithms depend on the implementation of the target architecture.

3.1.14 Loop Unrolling (OpenVMS I64 and OpenVMS Alpha systems)

Loop unrolling is a technique for increasing the amount of code between branch instructions and labels by replicating the body of a loop. Increasing the code optimizes instruction scheduling. The following code shows such a transformation:

Original Code

FOR i:= 1 to 12 DO
    a[i]:= b[i] + c[i]

Unrolled Loop Code

i:= 1
WHILE i < 12 DO
    BEGIN
    a[i]:= b[i] + c[i];
    a[i+1]:= b[i+1] + c[i+1];
    a[i+2]:= b[i+2] + c[i+2];
    a[i+3]:= b[i+3] + c[i+3];
    i:= i+4;
    END;

In this example, the loop body was replicated four times, allowing the instruction scheduler to overlap the fetching of array elements with the addition of other array elements.

By default, loop unrolling makes 4 copies of an unrolled loop. You can change the number of copies from 1 to 16. This is controlled by:

/OPTIMIZE=UNROLL="number"

Numbers larger than 4 may improve performance at a cost of additional code size. However, larger numbers may decrease performance due to cache requirements, register conflicts, and other factors.

3.1.15 Alignment of Compiler-Generated Labels

The compiler aligns the labels it generates for the top of loops, the beginnings of ELSE branches, and others, on machine-specific boundaries by filling in unused bytes with NO-OP instructions.

A branch to a longword-aligned address is faster than a branch to an unaligned address. This optimization may increase the size of the generated code; however, it increases run-time speed.

3.1.16 Error Reduction Through Optimization

An optimized program produces results and run-time diagnostic messages identical to those produced by an equivalent unoptimized program. An optimized program may produce fewer run-time diagnostics, however, and the diagnostics may occur at different statements in the source program. For example:

Unoptimized Code	Optimized Code
a := x/y;	t := x/y;
b := x/y;	a := t;
FOR i := 1 TO 10 DO	b := t;
c[i] := c[i] * x/y;	FOR i := 1 TO 10 DO c[i] := c[i] * t;

If the value of y is 0.0, the unoptimized program would produce 12 divide-by-zero errors at run time; the optimized program produces only one. (The variable t is a temporary variable created by the compiler.) Eliminating redundant calculations and removing invariant calculations from loops can affect the detection of such arithmetic errors. You should keep this in mind when you include error-detection routines in your program.

Contents

Index