HP OpenVMS Systems Documentation |
HP Fortran for OpenVMS
|
Previous | Contents | Index |
Before you attempt to analyze and improve program performance, you should:
To ensure that your software development environment can significantly improve the run-time performance of your applications, obtain and install the following optional software products:
About system-wide tuning and suggestions for other performance
enhancements on OpenVMS systems, see the HP OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.
5.1.2 Compile Using Multiple Source Files and Appropriate FORTRAN Qualifiers
During the earlier stages of program development, you can use incremental compilation with minimal optimization. For example:
$ FORTRAN /OPTIMIZE=LEVEL=1 SUB2 $ FORTRAN /OPTIMIZE=LEVEL=1 SUB3 $ FORTRAN /OPTIMIZE=LEVEL=1 MAIN $ LINK MAIN SUB2 SUB3 |
During the later stages of program development, you should compile multiple source files together and use an optimization level of at least /OPTIMIZE=LEVEL=4 on the FORTRAN command line to allow more interprocedure optimizations to occur. For instance, the following command compiles all three source files together using the default level of optimization (/OPTIMIZE=LEVEL=4):
$ FORTRAN MAIN.F90+SUB2.F90+SUB3.F90 $ LINK MAIN.OBJ |
Compiling multiple source files using the plus sign (+) separator lets the compiler examine more code for possible optimizations, which results in:
When compiling all source files together is not feasible (such as for very large programs), consider compiling source files containing related routines together with multiple FORTRAN commands, rather than compiling source files individually.
Table 5-1 shows FORTRAN qualifiers that can improve performance. Most of these qualifiers do not affect the accuracy of the results, while others improve run-time performance but can change some numeric results.
HP Fortran performs certain optimizations unless you specify the appropriate FORTRAN command qualifiers. Additional optimizations can be enabled or disabled using FORTRAN command qualifiers.
Table 5-1 lists the FORTRAN qualifiers that can directly improve run-time performance.
Qualifier Names | Description and For More Information |
---|---|
/ALIGNMENT= keyword |
Controls whether padding bytes are added between data items within
common blocks, derived-type data, and Compaq Fortran 77 record structures
to make the data items naturally aligned.
See Section 5.3. |
/ASSUME=NOACCURACY_SENSITIVE |
Allows the compiler to reorder code based on algebraic identities to
improve performance, enabling certain optimizations. The numeric
results can be slightly different from the default
(/ASSUME=ACCURACY_SENSITIVE) because of the way intermediate results
are rounded. This slight difference in numeric results is acceptable to
most programs.
See Section 5.8.8. |
/ARCHITECTURE= keyword (Alpha only) |
Specifies the type of Alpha architecture code instructions to be
generated for the program unit being compiled; it uses the same options
(keywords) as used by the /OPTIMIZE=TUNE qualifier (Alpha only) (which
controls instruction scheduling).
See Section 2.3.6. |
/FAST |
Sets the following performance-related qualifiers:
/ALIGNMENT=(COMMONS=NATURAL, RECORDS=NATURAL, SEQUENCE) /ARCHITECTURE=HOST, /ASSUME=NOACCURACY_SENSITIVE, /MATH_LIBRARY=FAST (Alpha only), and /OPTIMIZE=TUNE=HOST (Alpha only). See Section 5.8.3. |
/INTEGER_SIZE= nn |
Controls the sizes of INTEGER and LOGICAL declarations without a kind
parameter.
See Section 2.3.26. |
/MATH_LIBRARY=FAST (Alpha only) |
Requests the use of certain math library routines (used by intrinsic
functions) that provide faster speed. Using this option causes a slight
loss of accuracy and provides less reliable arithmetic exception
checking to get significant performance improvements in those functions.
See Section 2.3.30. |
/OPTIMIZE=INLINE= keyword |
Specifies the types of procedures to be inlined. If omitted,
/OPTIMIZE=LEVEL=
n determines the types of procedures inlined. Certain INLINE
keywords are relevant only for /OPTIMIZE=LEVEL=1 or higher.
See Section 2.3.35. |
/OPTIMIZE=LEVEL= n (n = 0 to 5) |
Controls the optimization level and thus the types of optimization
performed. The default optimization level is /OPTIMIZE=LEVEL=4. Use
/OPTIMIZE=LEVEL=5 to activate loop transformation optimizations.
See Section 5.7. |
/OPTIMIZE=LOOPS |
Activates a group of loop transformation optimizations (a subset of
/OPTIMIZE=LEVEL=5).
See Section 5.7. |
/OPTIMIZE=PIPELINE |
Activates the software pipelining optimization (a subset of
/OPTIMIZE=LEVEL=4).
See Section 5.7. |
/OPTIMIZE=TUNE= keyword (Alpha only) |
Specifies the target processor generation (chip) architecture on which
the program will be run, allowing the optimizer to make decisions about
instruction tuning optimizations needed to create the most efficient
code. Keywords allow specifying one particular Alpha processor
generation type, multiple processor generation types, or the processor
generation type currently in use during compilation. Regardless of the
setting of /OPTIMIZE=TUNE=
xxxx, the generated code will run correctly on all
implementations of the Alpha architecture.
See Section 5.8.6. |
/OPTIMIZE=UNROLL= n |
Specifies the number of times a loop is unrolled (
n) when specified with optimization level /OPTIMIZE=LEVEL=3 or
higher. If you omit /OPTIMIZE=UNROLL=
n, the optimizer determines how many times loops are unrolled.
See Section 5.7.4.1. |
/REENTRANCY |
Specifies whether code generated for the main program and any Fortran
procedures it calls will be relying on threaded or asynchronous
reentrancy.
See Section 2.3.39. |
Table 5-2 lists qualifiers that can slow program performance. Some applications that require floating-point exception handling or rounding need to use the /IEEE_MODE and /ROUNDING_MODE qualifiers. Other applications might need to use the /ASSUME=DUMMY_ALIASES qualifier for compatibility reasons. Other qualifiers listed in Table 5-2 are primarily for troubleshooting or debugging purposes.
Qualifier Names | Description and For More Information |
---|---|
/ASSUME=DUMMY_ALIASES |
Forces the compiler to assume that dummy (formal) arguments to
procedures share memory locations with other dummy arguments or with
variables shared through use association, host association, or common
block use. These program semantics slow performance, so you should
specify /ASSUME=DUMMY_ALIASES only for the called subprograms that
depend on such aliases.
The use of dummy aliases violates the FORTRAN-77, Fortran 90, and Fortran 95 standards but occurs in some older programs. See Section 5.8.9. |
/CHECK[= keyword] |
Generates extra code for various types of checking at run time. This
increases the size of the executable image, but may be needed for
certain programs to handle arithmetic exceptions. Avoid using
/CHECK=ALL except for debugging purposes.
See Section 2.3.11. |
/IEEE_MODE= keyword other than /IEEE_MODE=DENORM_RESULTS (on I64) or /IEEE_MODE=FAST (on Alpha) |
On Alpha systems, using /IEEE_MODE=UNDERFLOW_TO_ZERO slows program
execution (like /SYNCHRONOUS_EXCEPTIONS (Alpha only)). Using
/IEEE_MODE=DENORM_RESULTS slows program execution even more than
/IEEE_MODE=UNDERFLOW_TO_ZERO.
See Section 2.3.24. |
/ROUNDING_MODE=DYNAMIC |
Certain rounding modes and changing the rounding mode can slow program
execution slightly.
See Section 2.3.40. |
/SYNCHRONOUS_EXCEPTIONS |
Generates extra code to associate an arithmetic exception with the
instruction that causes it, slowing program execution. Use this
qualifier only when troubleshooting, such as when identifying the
source of an exception.
See Section 2.3.46. |
/OPTIMIZE=LEVEL=0,
/OPTIMIZE=LEVEL=1, /OPTIMIZE=LEVEL=2, /OPTIMIZE=LEVEL=3 |
Minimizes the optimization level (and types of optimizations). Use
during the early stages of program development or when you will use the
debugger.
See Section 2.3.35 and Section 5.7. |
/OPTIMIZE=INLINE=NONE, /OPTIMIZE=INLINE=MANUAL |
Minimizes the types of inlining done by the optimizer. Use such
qualifiers only during the early stages of program development. The
type of inlining optimizations are also controlled by the
/OPTIMIZE=LEVEL qualifier.
See Section 2.3.35 and Section 5.7. |
Certain DCL commands and system tuning can improve run-time performance:
$ DEFINE /USER FOR006 RESULTS.LIS $ RUN MYPROG $ TYPE/PAGE RESULTS.LIS |
About system-wide tuning and suggestions for other performance
enhancements on OpenVMS systems, see the HP OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.
5.2 Analyzing Program Performance
This section describes how you can:
Before you analyze program performance, make sure any errors you might
have encountered during the early stages of program development have
been corrected.
5.2.1 Measuring Performance Using LIB$xxxx_TIMER Routines or Command Procedures
You can use LIB$xxxx_TIMER routines or an equivalent DCL command procedure to measure program performance.
Using the LIB$xxxx_TIMER routines allows you to display timing and related statistics at various points in the program as well as at program completion, including elapsed time, actual CPU time, buffered I/O, direct I/O, and page faults. If needed, you can use other routines or system services to obtain and report other information.
You can measure performance for the entire program by using a DCL
command procedure (see Section 5.2.1.2). Although using a DCL command
procedure does not report statistics at various points in the program,
it can provide information for the entire program similar to that
provided by the LIB$xxxx_TIMER routines.
5.2.1.1 The LIB$xxxx_TIMER Routines
Use the following routines together to provide information about program performance at various points in your program:
Run program timings when other users are not active. Your timing results can be affected by one or more CPU-intensive processes also running while doing your timings.
Try to run the program under the same conditions each time to provide the most accurate results, especially when comparing execution times of a previous version of the same program. Use the same CPU system (model, amount of memory, version of the operating system, and so on) if possible.
If you do need to change systems, you should measure the time using the same version of the program on both systems, so you know each system's effect on your timings.
For programs that run for less than a few seconds, repeat the timings several times to ensure that the results are not misleading. Overhead functions might influence short timings considerably.
You can use the LIB$SHOW_TIMER (or LIB$STAT_TIMER) routine to return elapsed time, CPU time, buffered I/O, direct I/O, and page faults:
The HP Fortran program shown in Example 5-1 reports timings for the three different sections of the main program, including accumulative statistics (for a scalar program).
Example 5-1 Measuring Program Performance Using LIB$SHOW_TIMER and LIB$INIT_TIMER |
---|
! Example use of LIB$SHOW_TIMER to time an HP Fortran program PROGRAM TIMER INTEGER TIMER_CONTEXT DATA TIMER_CONTEXT /0/ ! Initialize default timer stats to 0 CALL LIB$INIT_TIMER ! Sample first section of code to be timed DO I=1,100 CALL MOM ENDDO ! Display stats TYPE *,'Stats for first section' CALL LIB$SHOW_TIMER ! Zero second timer context CALL LIB$INIT_TIMER (TIMER_CONTEXT) ! Sample second section of code to be timed DO I=1,1000 CALL MOM ENDDO ! Display stats TYPE *,'Stats for second section' CALL LIB$SHOW_TIMER (TIMER_CONTEXT) TYPE *,'Accumulated stats for two sections' CALL LIB$SHOW_TIMER ! Re-Initialize second timer stats to 0 CALL LIB$INIT_TIMER (TIMER_CONTEXT) ! Sample Third section of code to be timed DO I=1,1000 CALL MOM ENDDO ! Display stats TYPE *,'Stats for third section' CALL LIB$SHOW_TIMER (TIMER_CONTEXT) TYPE *,'Accumulated stats for all sections' CALL LIB$SHOW_TIMER END PROGRAM TIMER ! Sample subroutine performs enough processing so times aren't all 0.0 SUBROUTINE MOM COMMON BOO(10000) DOUBLE PRECISION BOO BOO = 0.5 ! Initialize all array elements to 0.5 DO I=2,10000 BOO(I) = 4.0+(BOO(I-1)+1)*BOO(I)*COSD(BOO(I-1)+30.0) BOO(I-1) = SIND(BOO(I)**2) ENDDO RETURN END SUBROUTINE MOM |
The LIB$xxxx_TIMER routines use a single default time when called without an argument. When you call LIB$xxxx_TIMER routines with an INTEGER argument whose initial value is 0 (zero), you enable use of multiple timers.
The LIB$INIT_TIMER routine must be called at the start of the timing. It can be called again at any time to reset (set to zero) the values.
In Example 5-1, LIB$INIT_TIMER is:
The LIB$SHOW_TIMER routine displays the timer values saved by LIB$INIT_TIMER to SYS$OUTPUT (or to a specified routine). Your program must call LIB$INIT_TIMER before LIB$SHOW_TIMER at least once (to start the timing).
Like LIB$INIT_TIMER:
The free-format source file, TIMER.F90, might be compiled and linked as follows:
$ FORTRAN/FLOAT=IEEE_FLOAT TIMER $ LINK TIMER |
When the program is run (on a low-end Alpha system), it displays timing statistics for each section of the program as well as accumulated statistics:
$ RUN TIMER Stats for first section ELAPSED: 0 00:00:02.36 CPU: 0:00:02.21 BUFIO: 1 DIRIO: 0 FAULTS: 23 Stats for second section ELAPSED: 0 00:00:22.31 CPU: 0:00:22.09 BUFIO: 1 DIRIO: 0 FAULTS: 0 Accumulated stats for two sections ELAPSED: 0 00:00:24.68 CPU: 0:00:24.30 BUFIO: 5 DIRIO: 0 FAULTS: 27 Stats for third section ELAPSED: 0 00:00:22.24 CPU: 0:00:21.98 BUFIO: 1 DIRIO: 0 FAULTS: 0 Accumulated stats for all sections ELAPSED: 0 00:00:46.92 CPU: 0:00:46.28 BUFIO: 9 DIRIO: 0 FAULTS: 27 $ |
You might:
Instead of the LIB$xxxx_TIMER routines (specific to the OpenVMS operating system), you might consider modifying the program to call other routines within the program to measure execution time (but not obtain other process information). For example, you might use HP Fortran intrinsic procedures, such as SYSTEM_CLOCK, DATE_AND_TIME, and TIME.
Previous | Next | Contents | Index |