HP OpenVMS Systems Documentation |
HP Pascal for OpenVMS
|
Previous | Contents | Index |
This chapter discusses the following topics:
The objective of optimization is to produce source and object programs
that achieve the greatest amount of processing with the least amount of
time and memory. Realizing this objective requires programs that are
carefully designed and written, and compilation techniques, such as
those used byHP Pascal, that take advantage of the operating system
and machine architecture environment. (The benefits of portable code
and program efficiency depend on the requirements of your application.)
3.1 Compiler Optimizations
By default, programs compiled with the HP Pascal compiler undergo optimization. An optimizing compiler automatically attempts to remove repetitious instructions and redundant computations by making assumptions about the values of certain variables. This, in turn, reduces the size of the object code, allowing a program written in a high-level language to execute at a speed comparable to that of a well-written assembly language program. Optimization can increase the amount of time required to compile a program, but the result is a program that may execute faster and more efficiently than a nonoptimized program.
The language elements you use in the source program directly affect the compiler's ability to optimize the object program. Therefore, you should be aware of the ways in which you can assist compiler optimization. In addition, this awareness often makes it easier for you to track down the source of a problem when your program exhibits unexpected behavior.
The compiler performs the following optimizations:
These optimizations are described in the following sections. In addition, the compiler performs the following optimizations, which can be detected only by a careful examination of the machine code produced by the compiler:
The compiler performs the following computations on constant expressions at compile time:
x := -10.0; |
x := 10 * y; |
x := 10.0 * y; |
CONST nn = 27; {In the executable section:} i := 2 * nn + j; |
i := 54 + j; |
VAR i : ARRAY[1..10, 1..10] OF INTEGER; {In the executable section:} i[1,2] := i[4,5]; |
The same subexpression often appears in more than one computation within a program. For example:
a := b * c + e * f; h := a + g - b * c; IF ((b * c) - h) <> 0 THEN ... |
In this code sequence, the subexpression b * c appears three times. If the values of the operands b and c do not change between computations, the value b * c can be computed once and the result can be used in place of the subexpression. The previous sequence is compiled as follows:
t := b * c; a := t + e * f; h := a + g - t; IF ((t) - h) <> 0 THEN ... |
Two computations of b * c have been eliminated. In this case, you could have modified the source program itself for greater program optimization.
The following example shows a more significant application of this kind of compiler optimization, in which you could not reasonably modify the source code to achieve the same effect:
VAR a, b : ARRAY[1..25, 1..25] OF REAL; {In the executable section:} a[i,j] := b[i,j]; |
Without optimization, this source program would be compiled as follows:
t1 := (j - 1) * 25 + i; t2 := (j - 1) * 25 + i; a[t1] := b[t2]; |
Variables t1 and t2 represent equivalent expressions. The compiler eliminates this redundancy by producing the following optimization:
t = (j - 1) * 25 + i; a[t] := b[t]; |
The compiler can determine which lines of code, if any, are never executed and eliminates that code from the object module being produced. For example, consider the following lines from a program:
CONST Debug_Switch = FALSE; {In the executable section:} IF Debug_Switch THEN WRITELN( 'Error found here' ); |
The IF statement is designed to write an error message if the value of the symbolic constant Debug_Switch is TRUE. Suppose that the error has been removed, and you change the definition of Debug_Switch to give it the value FALSE. When the program is recompiled, the compiler can determine that the THEN clause will never be executed because the IF condition is always FALSE; no machine code is generated for this clause. You need not remove the IF statement from the source program.
Code that is otherwise unreachable, but contains one or more labels, is
not eliminated unless the GOTO statement and the label itself are
located in the same block.
3.1.4 Code Hoisting from Structured Statements
The compiler can improve the execution speed and size of programs by removing invariant computations from structured statements. For example:
FOR j := 1 TO i + 23 DO BEGIN IF Selector THEN a[i + 23, j - 14] := 0 ELSE b[i + 23, j - 14] := 1; END; |
If the compiler detected this IF statement, it would recognize that, regardless of the Boolean value of Selector, a value is stored in the array component denoted by [i + 23, j -- 14]. The compiler would change the sequence to the following:
t := i + 23; FOR j := 1 TO t DO BEGIN u := j - 14; IF Selector THEN a[t,u] := 0 ELSE b[t,u] := 1; END; |
This removes the calculation of j -- 14 from the IF statement, and the
calculation of i + 23 from both the IF statement and the loop.
3.1.5 Inline Code Expansion for Predeclared Functions
The compiler can often replace calls to predeclared routines with the actual algorithm for performing the calculation. For example:
Square := SQR( a ); |
The compiler replaces this function call with the following, and generates machine code based on the expanded call:
Square := a * a; |
The program executes faster because the algorithm for the SQR function
has already been included in the machine code.
3.1.6 Inline Code Expansion for User-Declared Routines
Inline code expansion for user-declared routines performs in the same manner as inline code expansion for predeclared functions: the compiler can often replace calls to user-declared routines with an inline expansion of the routine's executable code. Inline code expansion is useful on routines that are called only a few times. The overhead of an actual procedure call is avoided, which increases program execution. The size of the program, however, may increase due to the routine's expansion.
To determine whether or not it is desirable to inline expand a routine,
compilers use a complex algorithm. Section 3.1.7 describes the
algorithm for HP Pascal on OpenVMS VAX systems; HP Pascal on
OpenVMS I64 and OpenVMS Alpha systems uses a similar algorithm to
make the determination.
3.1.7 Testing for Inline Expansion on OpenVMS VAX Systems
The first part of the algorithm performs tests for cases that always prohibit the routine from being inlined. A failure of one of these tests can be thought of as a hard failure. These hard failure tests verify that the following are false; if any one of these tests is true, the routine is not inlined:
The second part of the algorithm performs tests to determine how desirable it is to inline the routine at a particular call point. A failure to one of these tests can be thought of as a soft failure. These tests check for the number of formal parameters, number of local variables, whether the called routine is directly recursive, the number of direct calls to the routine, and the size of both the calling and the called routine.
If an explicit [OPTIMIZE(INLINE)] attribute is specified on the routine declaration, the hard failure tests are still performed; however, the soft failure tests are not. So if the routine passes the hard failure tests, that routine is inlined at all call points. Specifying this attribute provides you with more power in deciding which routines should be inlined.
There is no stack frame for an inline user-declared routine and no debugger symbol table information for the expanded routine. Debugging the execution of an inline routine is difficult and is not recommended. |
The compiler can produce more efficient machine code by rearranging operations to avoid having to negate and then calculate the complement of the values involved. For example:
(-c) * (b - a) |
If a program includes this operation, the compiler rearranges the operation to read as follows:
c * (a - b) |
These two operations produce the same result, but because the compiler
has eliminated negation or complement operations, the machine code
produced is more efficient.
3.1.9 Partial Evaluation of Logical Expressions
The Pascal language does not specify the order in which the components of an expression must be evaluated. If the value of an expression can be determined by partial evaluation, then some subexpressions may not be evaluated at all. This situation occurs most frequently in the evaluation of logical expressions. For example:
WHILE ( i < 10 ) AND ( a[i] <> 0 ) DO BEGIN a[i] := a[i] + 1; i := i + 1; END; |
In this WHILE statement, the order in which the two subexpressions ( i < 10 ) and ( a[i] <> 0 ) are evaluated is not specified; in fact, the compiler may evaluate them simultaneously. Regardless of which subexpression is evaluated first, if its value is FALSE the condition being tested in the WHILE statement is also FALSE. The other subexpression need not be evaluated at all. In this case, the body of the loop is never executed.
To force the compiler to evaluate expressions in left-to-right order with short circuiting, you can use the AND_THEN operator, as shown in the following example:
WHILE ( i < 10 ) AND_THEN ( a[i] <> 0 ) DO BEGIN a[i] := a[i] + 1; i := i + 1; END; |
The compiler keeps track of the values assigned to variables and traces the values to most of the places that they are used. If it is more efficient to use the value rather than a reference to the variable, the compiler makes this change. This optimization is called value propagation. Value propagation causes the object code to be smaller, and may also improve run-time speed.
Value propagation performs the following actions:
Pi := 3.14; Pi_Over_2 := Pi/2; |
Pi := 3.14; Pi_Over_2 := 1.57; |
x := 3; IF x <> 3 THEN y := 30 ELSE y := 20; |
x := 3; y := 20; |
Strength reduction speeds computations by replacing a multiply
operation with a more efficient add instruction when computing array
addresses each time around a loop.
3.1.12 Split Lifetime Analysis (OpenVMS I64 and OpenVMS Alpha systems)
Split lifetime analysis improves register usage by determining if the lifetime of a variable can be broken into multiple, independent sections. If so, the variable may be stored in different registers for each section. The registers can then be reused for other purposes between sections. Therefore, there may be times when the value of the variable does not exist anywhere in the registers. For example:
v:= 3.0 *q; . . . x:= SIN(y)*v: . . . v:= PI*x: . . . y:= COS(y)*v; |
This example shows that the variable v has two disjoint usage sections.
The value of v in the first section does not affect the value of v in
the second section. The compiler may use different registers for each
section.
3.1.13 Code Scheduling (OpenVMS I64 and OpenVMS Alpha systems)
Code scheduling is a technique for reordering machine instructions to
maximize the amount of overlap of the multiple execution units inside
the CPU. The exact scheduling algorithms depend on the implementation
of the target architecture.
3.1.14 Loop Unrolling (OpenVMS I64 and OpenVMS Alpha systems)
Loop unrolling is a technique for increasing the amount of code between branch instructions and labels by replicating the body of a loop. Increasing the code optimizes instruction scheduling. The following code shows such a transformation:
FOR i:= 1 to 12 DO a[i]:= b[i] + c[i] |
i:= 1 WHILE i < 12 DO BEGIN a[i]:= b[i] + c[i]; a[i+1]:= b[i+1] + c[i+1]; a[i+2]:= b[i+2] + c[i+2]; a[i+3]:= b[i+3] + c[i+3]; i:= i+4; END; |
In this example, the loop body was replicated four times, allowing the instruction scheduler to overlap the fetching of array elements with the addition of other array elements.
By default, loop unrolling makes 4 copies of an unrolled loop. You can change the number of copies from 1 to 16. This is controlled by:
/OPTIMIZE=UNROLL="number" |
Numbers larger than 4 may improve performance at a cost of additional
code size. However, larger numbers may decrease performance due to
cache requirements, register conflicts, and other factors.
3.1.15 Alignment of Compiler-Generated Labels
The compiler aligns the labels it generates for the top of loops, the beginnings of ELSE branches, and others, on machine-specific boundaries by filling in unused bytes with NO-OP instructions.
A branch to a longword-aligned address is faster than a branch to an
unaligned address. This optimization may increase the size of the
generated code; however, it increases run-time speed.
3.1.16 Error Reduction Through Optimization
An optimized program produces results and run-time diagnostic messages identical to those produced by an equivalent unoptimized program. An optimized program may produce fewer run-time diagnostics, however, and the diagnostics may occur at different statements in the source program. For example:
Unoptimized Code | Optimized Code |
---|---|
a := x/y; | t := x/y; |
b := x/y; | a := t; |
FOR i := 1 TO 10 DO | b := t; |
c[i] := c[i] * x/y; |
FOR i := 1 TO 10 DO
c[i] := c[i] * t; |
If the value of y is 0.0, the unoptimized program would produce 12 divide-by-zero errors at run time; the optimized program produces only one. (The variable t is a temporary variable created by the compiler.) Eliminating redundant calculations and removing invariant calculations from loops can affect the detection of such arithmetic errors. You should keep this in mind when you include error-detection routines in your program.
Previous | Next | Contents | Index |