HP Fortran for OpenVMS
User Manual


Previous Contents Index


Chapter 5
Performance: Making Programs Run Faster

This chapter describes:

5.1 Software Environment and Efficient Compilation

Before you attempt to analyze and improve program performance, you should:

5.1.1 Install the Latest Version of HP Fortran and Performance Products

To ensure that your software development environment can significantly improve the run-time performance of your applications, obtain and install the following optional software products:

For More Information:

About system-wide tuning and suggestions for other performance enhancements on OpenVMS systems, see the HP OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.

5.1.2 Compile Using Multiple Source Files and Appropriate FORTRAN Qualifiers

During the earlier stages of program development, you can use incremental compilation with minimal optimization. For example:


$ FORTRAN /OPTIMIZE=LEVEL=1 SUB2
$ FORTRAN /OPTIMIZE=LEVEL=1 SUB3
$ FORTRAN /OPTIMIZE=LEVEL=1 MAIN
$ LINK MAIN SUB2 SUB3

During the later stages of program development, you should compile multiple source files together and use an optimization level of at least /OPTIMIZE=LEVEL=4 on the FORTRAN command line to allow more interprocedure optimizations to occur. For instance, the following command compiles all three source files together using the default level of optimization (/OPTIMIZE=LEVEL=4):


$ FORTRAN MAIN.F90+SUB2.F90+SUB3.F90
$ LINK MAIN.OBJ 

Compiling multiple source files using the plus sign (+) separator lets the compiler examine more code for possible optimizations, which results in:

When compiling all source files together is not feasible (such as for very large programs), consider compiling source files containing related routines together with multiple FORTRAN commands, rather than compiling source files individually.

Table 5-1 shows FORTRAN qualifiers that can improve performance. Most of these qualifiers do not affect the accuracy of the results, while others improve run-time performance but can change some numeric results.

HP Fortran performs certain optimizations unless you specify the appropriate FORTRAN command qualifiers. Additional optimizations can be enabled or disabled using FORTRAN command qualifiers.

Table 5-1 lists the FORTRAN qualifiers that can directly improve run-time performance.

Table 5-1 FORTRAN Qualifiers Related to Run-Time Performance
Qualifier Names Description and For More Information
/ALIGNMENT= keyword Controls whether padding bytes are added between data items within common blocks, derived-type data, and Compaq Fortran 77 record structures to make the data items naturally aligned.

See Section 5.3.

/ASSUME=NOACCURACY_SENSITIVE Allows the compiler to reorder code based on algebraic identities to improve performance, enabling certain optimizations. The numeric results can be slightly different from the default (/ASSUME=ACCURACY_SENSITIVE) because of the way intermediate results are rounded. This slight difference in numeric results is acceptable to most programs.

See Section 5.8.8.

/ARCHITECTURE= keyword (Alpha only) Specifies the type of Alpha architecture code instructions to be generated for the program unit being compiled; it uses the same options (keywords) as used by the /OPTIMIZE=TUNE qualifier (Alpha only) (which controls instruction scheduling).

See Section 2.3.6.

/FAST Sets the following performance-related qualifiers:
/ALIGNMENT=(COMMONS=NATURAL, RECORDS=NATURAL, SEQUENCE) /ARCHITECTURE=HOST, /ASSUME=NOACCURACY_SENSITIVE, /MATH_LIBRARY=FAST (Alpha only), and /OPTIMIZE=TUNE=HOST (Alpha only).

See Section 5.8.3.

/INTEGER_SIZE= nn Controls the sizes of INTEGER and LOGICAL declarations without a kind parameter.

See Section 2.3.26.

/MATH_LIBRARY=FAST (Alpha only) Requests the use of certain math library routines (used by intrinsic functions) that provide faster speed. Using this option causes a slight loss of accuracy and provides less reliable arithmetic exception checking to get significant performance improvements in those functions.

See Section 2.3.30.

/OPTIMIZE=INLINE= keyword Specifies the types of procedures to be inlined. If omitted, /OPTIMIZE=LEVEL= n determines the types of procedures inlined. Certain INLINE keywords are relevant only for /OPTIMIZE=LEVEL=1 or higher.

See Section 2.3.35.

/OPTIMIZE=LEVEL= n (n = 0 to 5) Controls the optimization level and thus the types of optimization performed. The default optimization level is /OPTIMIZE=LEVEL=4. Use /OPTIMIZE=LEVEL=5 to activate loop transformation optimizations.

See Section 5.7.

/OPTIMIZE=LOOPS Activates a group of loop transformation optimizations (a subset of /OPTIMIZE=LEVEL=5).

See Section 5.7.

/OPTIMIZE=PIPELINE Activates the software pipelining optimization (a subset of /OPTIMIZE=LEVEL=4).

See Section 5.7.

/OPTIMIZE=TUNE= keyword (Alpha only) Specifies the target processor generation (chip) architecture on which the program will be run, allowing the optimizer to make decisions about instruction tuning optimizations needed to create the most efficient code. Keywords allow specifying one particular Alpha processor generation type, multiple processor generation types, or the processor generation type currently in use during compilation. Regardless of the setting of /OPTIMIZE=TUNE= xxxx, the generated code will run correctly on all implementations of the Alpha architecture.

See Section 5.8.6.

/OPTIMIZE=UNROLL= n Specifies the number of times a loop is unrolled ( n) when specified with optimization level /OPTIMIZE=LEVEL=3 or higher. If you omit /OPTIMIZE=UNROLL= n, the optimizer determines how many times loops are unrolled.

See Section 5.7.4.1.

/REENTRANCY Specifies whether code generated for the main program and any Fortran procedures it calls will be relying on threaded or asynchronous reentrancy.

See Section 2.3.39.

Table 5-2 lists qualifiers that can slow program performance. Some applications that require floating-point exception handling or rounding need to use the /IEEE_MODE and /ROUNDING_MODE qualifiers. Other applications might need to use the /ASSUME=DUMMY_ALIASES qualifier for compatibility reasons. Other qualifiers listed in Table 5-2 are primarily for troubleshooting or debugging purposes.

Table 5-2 Qualifiers that Slow Run-Time Performance
Qualifier Names Description and For More Information
/ASSUME=DUMMY_ALIASES Forces the compiler to assume that dummy (formal) arguments to procedures share memory locations with other dummy arguments or with variables shared through use association, host association, or common block use. These program semantics slow performance, so you should specify /ASSUME=DUMMY_ALIASES only for the called subprograms that depend on such aliases.

The use of dummy aliases violates the FORTRAN-77, Fortran 90, and Fortran 95 standards but occurs in some older programs.

See Section 5.8.9.

/CHECK[= keyword] Generates extra code for various types of checking at run time. This increases the size of the executable image, but may be needed for certain programs to handle arithmetic exceptions. Avoid using /CHECK=ALL except for debugging purposes.

See Section 2.3.11.

/IEEE_MODE= keyword other than /IEEE_MODE=DENORM_RESULTS (on I64) or /IEEE_MODE=FAST (on Alpha) On Alpha systems, using /IEEE_MODE=UNDERFLOW_TO_ZERO slows program execution (like /SYNCHRONOUS_EXCEPTIONS (Alpha only)). Using /IEEE_MODE=DENORM_RESULTS slows program execution even more than /IEEE_MODE=UNDERFLOW_TO_ZERO.

See Section 2.3.24.

/ROUNDING_MODE=DYNAMIC Certain rounding modes and changing the rounding mode can slow program execution slightly.

See Section 2.3.40.

/SYNCHRONOUS_EXCEPTIONS Generates extra code to associate an arithmetic exception with the instruction that causes it, slowing program execution. Use this qualifier only when troubleshooting, such as when identifying the source of an exception.

See Section 2.3.46.

/OPTIMIZE=LEVEL=0,
/OPTIMIZE=LEVEL=1,
/OPTIMIZE=LEVEL=2,
/OPTIMIZE=LEVEL=3
Minimizes the optimization level (and types of optimizations). Use during the early stages of program development or when you will use the debugger.

See Section 2.3.35 and Section 5.7.

/OPTIMIZE=INLINE=NONE, /OPTIMIZE=INLINE=MANUAL Minimizes the types of inlining done by the optimizer. Use such qualifiers only during the early stages of program development. The type of inlining optimizations are also controlled by the /OPTIMIZE=LEVEL qualifier.

See Section 2.3.35 and Section 5.7.

For More Information:

5.1.3 Process Environment and Related Influences on Performance

Certain DCL commands and system tuning can improve run-time performance:

For More Information:

About system-wide tuning and suggestions for other performance enhancements on OpenVMS systems, see the HP OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.

5.2 Analyzing Program Performance

This section describes how you can:

Before you analyze program performance, make sure any errors you might have encountered during the early stages of program development have been corrected.

5.2.1 Measuring Performance Using LIB$xxxx_TIMER Routines or Command Procedures

You can use LIB$xxxx_TIMER routines or an equivalent DCL command procedure to measure program performance.

Using the LIB$xxxx_TIMER routines allows you to display timing and related statistics at various points in the program as well as at program completion, including elapsed time, actual CPU time, buffered I/O, direct I/O, and page faults. If needed, you can use other routines or system services to obtain and report other information.

You can measure performance for the entire program by using a DCL command procedure (see Section 5.2.1.2). Although using a DCL command procedure does not report statistics at various points in the program, it can provide information for the entire program similar to that provided by the LIB$xxxx_TIMER routines.

5.2.1.1 The LIB$xxxx_TIMER Routines

Use the following routines together to provide information about program performance at various points in your program:

Run program timings when other users are not active. Your timing results can be affected by one or more CPU-intensive processes also running while doing your timings.

Try to run the program under the same conditions each time to provide the most accurate results, especially when comparing execution times of a previous version of the same program. Use the same CPU system (model, amount of memory, version of the operating system, and so on) if possible.

If you do need to change systems, you should measure the time using the same version of the program on both systems, so you know each system's effect on your timings.

For programs that run for less than a few seconds, repeat the timings several times to ensure that the results are not misleading. Overhead functions might influence short timings considerably.

You can use the LIB$SHOW_TIMER (or LIB$STAT_TIMER) routine to return elapsed time, CPU time, buffered I/O, direct I/O, and page faults:

The HP Fortran program shown in Example 5-1 reports timings for the three different sections of the main program, including accumulative statistics (for a scalar program).

Example 5-1 Measuring Program Performance Using LIB$SHOW_TIMER and LIB$INIT_TIMER

!  Example use of LIB$SHOW_TIMER to time an HP Fortran program 
 
 PROGRAM TIMER 
 
   INTEGER TIMER_CONTEXT 
   DATA    TIMER_CONTEXT /0/ 
 
!  Initialize default timer stats to 0 
 
   CALL LIB$INIT_TIMER 
 
!  Sample first section of code to be timed 
 
   DO I=1,100 
     CALL MOM 
   ENDDO 
 
!  Display stats 
 
   TYPE *,'Stats for first section' 
   CALL LIB$SHOW_TIMER 
 
!  Zero second timer context 
 
   CALL LIB$INIT_TIMER (TIMER_CONTEXT) 
 
!  Sample second section of code to be timed 
 
   DO I=1,1000 
     CALL MOM 
   ENDDO 
 
!  Display stats 
 
   TYPE *,'Stats for second section' 
   CALL LIB$SHOW_TIMER (TIMER_CONTEXT) 
   TYPE *,'Accumulated stats for two sections' 
   CALL LIB$SHOW_TIMER 
 
!  Re-Initialize second timer stats to 0 
 
   CALL LIB$INIT_TIMER (TIMER_CONTEXT) 
 
!  Sample Third section of code to be timed 
 
   DO I=1,1000 
     CALL MOM 
   ENDDO 
 
!  Display stats 
 
   TYPE *,'Stats for third section' 
   CALL LIB$SHOW_TIMER (TIMER_CONTEXT) 
   TYPE *,'Accumulated stats for all sections' 
   CALL LIB$SHOW_TIMER 
 
 END PROGRAM TIMER 
 
!  Sample subroutine performs enough processing so times aren't all 0.0 
 
 SUBROUTINE MOM 
   COMMON  BOO(10000) 
   DOUBLE PRECISION BOO 
   BOO = 0.5    ! Initialize all array elements to 0.5 
 
   DO I=2,10000 
      BOO(I)   = 4.0+(BOO(I-1)+1)*BOO(I)*COSD(BOO(I-1)+30.0) 
      BOO(I-1) = SIND(BOO(I)**2) 
   ENDDO 
 
   RETURN 
 
 END SUBROUTINE MOM 

The LIB$xxxx_TIMER routines use a single default time when called without an argument. When you call LIB$xxxx_TIMER routines with an INTEGER argument whose initial value is 0 (zero), you enable use of multiple timers.

The LIB$INIT_TIMER routine must be called at the start of the timing. It can be called again at any time to reset (set to zero) the values.

In Example 5-1, LIB$INIT_TIMER is:

The LIB$SHOW_TIMER routine displays the timer values saved by LIB$INIT_TIMER to SYS$OUTPUT (or to a specified routine). Your program must call LIB$INIT_TIMER before LIB$SHOW_TIMER at least once (to start the timing).

Like LIB$INIT_TIMER:

The free-format source file, TIMER.F90, might be compiled and linked as follows:


$ FORTRAN/FLOAT=IEEE_FLOAT TIMER
$ LINK TIMER 

When the program is run (on a low-end Alpha system), it displays timing statistics for each section of the program as well as accumulated statistics:


$ RUN TIMER 
Stats for first section 
 ELAPSED:    0 00:00:02.36  CPU: 0:00:02.21  BUFIO: 1  DIRIO: 0  FAULTS: 23 
Stats for second section 
 ELAPSED:    0 00:00:22.31  CPU: 0:00:22.09  BUFIO: 1  DIRIO: 0  FAULTS: 0 
Accumulated stats for two sections 
 ELAPSED:    0 00:00:24.68  CPU: 0:00:24.30  BUFIO: 5  DIRIO: 0  FAULTS: 27 
Stats for third section 
 ELAPSED:    0 00:00:22.24  CPU: 0:00:21.98  BUFIO: 1  DIRIO: 0  FAULTS: 0 
Accumulated stats for all sections 
 ELAPSED:    0 00:00:46.92  CPU: 0:00:46.28  BUFIO: 9  DIRIO: 0  FAULTS: 27 
 
$

You might:

Instead of the LIB$xxxx_TIMER routines (specific to the OpenVMS operating system), you might consider modifying the program to call other routines within the program to measure execution time (but not obtain other process information). For example, you might use HP Fortran intrinsic procedures, such as SYSTEM_CLOCK, DATE_AND_TIME, and TIME.

For More Information:


Previous Next Contents Index