Compaq KAP Fortran/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index


Chapter 2
How to Run KAP

This chapter describes the commands necessary to run KAP on Tru64 UNIX systems.

Compaq KAP Fortran/OpenMP can be run in either of two modes:

2.1 General KAP Information

The following provides information and restrictions:

2.2 Installing Compaq KAP

KAP is installed on the Tru64 UNIX system with the system command setld . See the Compaq KAP Fortran/OpenMP for Tru64 UNIX Installation Guide for details.

2.3 Compiling a Program Using the kf90 Driver

The kf90 command invokes a driver program that automatically calls KAP, the Compaq Fortran compiler, and the linker.

2.3.1 Passing Default KAP Switch Settings to kf90

Because kf90 calls KAP and the Compaq Fortran compiler, you can substitute the kf90 command for the Fortran 90 command. For example, to use kf90 to compile myprog.f90 with the default KAP switch settings, use the command:


kf90 myprog.f90 

The kf90 command uses the KAP preprocessor on myprog.f90 , compiles the result with the Compaq Fortran compiler, links the object code into an executable image, and produces the following files:

To see a list of the KAP switches and Compaq Fortran compiler switches passed by kf90 , use the -v switch as follows:


kf90 myprog.f90 -v 

An example of the output is:


oursmp> kf90 -v matmul.f timing_fortran.o 
/usr/bin/kapf90 -cmp=./matmul.cmp.f matmul.f -tune=EV4 -nofree 
KAP/Tru64_U_F90    4.4 k340504 20010517    20-Aug-2001   09:27:21 
KAP/Tru64_U_F90  4.4 k340504 20010517 : 0 errors in file matmul.f 
/usr/bin/f90 -fast -v ./matmul.cmp.f timing_fortran.o -tune host -non_shared 
/usr/lib/cmplrs/fort90/decfort90 -platinum -fast -tune host -non_shared 
                                 -I/usr/lib/cmplrs/hpfrtl 
                                 -o /tmp/forAAAaaupma.o ./matmul.cmp.f 
/usr/bin/cc -fast -v -tune host -non_shared /usr/lib/cmplrs/fort90/for_main.o 
                                           /tmp/forAAAaaupma.o 
                                           -O4 timing_fortran.o -qlshpf 
                                           -lUfor -lfor -lFutil -lm -lots -qlc_r 
/usr/lib/cmplrs/cc/ld -g0 -O4 -non_shared /usr/lib/cmplrs/cc/crt0.o 
                              /usr/lib/cmplrs/fort90/for_main.o 
                              /tmp/forAAAaaupma.o timing_fortran.o 
                              -qlshpf -qlc_r -lUfor -lfor -lFutil 
                              -lm -lots -lc 
/usr/lib/cmplrs/cc/ld: 
0.45u 0.39s 0:06 13% 0+28k 305+173io 14pf+0w 28stk+4752mem 

The final resource usage appears in C-shell format at the end.

2.4 Passing KAP Switches to kf90

The -fkapargs switch specifies one or more KAP command-line switches to the preprocessor. For example, to use kf90 to optimize and compile the file myprog.f90 using KAP switches for general optimization, use the command:


kf90 -fkapargs='-roundoff=3 -scalaropt=3 -list=myprog_annotated.lis' myprog.f90 

The following files result:

For descriptions of all KAP command-line switches, see Chapter 4.

2.4.1 Passing Compaq Fortran Compiler Switches to kf90

Any command-line switch that is valid for the Compaq Fortran compiler or the linker is valid for the kf90 command. You can specify compiler switches and KAP switches on the same line. For example, to optimize and compile the file myprog.f90 using KAP switches for general optimization and to specify the name of the executable file with the Compaq Fortran compiler switch, -o , use the following command:


kf90 -fkapargs='-optimize=5 -roundoff=3 -scalaropt=3' -o=myprog.exe myprog.f90 

The following files result:

The kf90 command specifies the Compaq Fortran compiler switches -notransform_loops , -lpthread , and -fast , by default. To override any of the individual compiler switches encompassed by -fast , specify them on the kf90 command line. For example, the following command sets the compiler switch -math_library accurate and overrides the default -math_library noaccurate set by -fast :


kf90 -math_library accurate myprog.f90 

For information about the -fast compiler switch, see the Compaq Fortran User Manual for Tru64 UNIX and Linux Alpha Systems.

2.4.2 Additional Information About Using the kf90 Driver

The kf90 command sets the compiler switch -tune host by default. The -tune host switch causes the compiler to optimize to the host architecture. For example, if you want to optimize for the ev5 architecture but are compiling on an ev4 system, you should override the default setting of the -tune switch, as follows:


kf90 -tune ev5 myprog.f90 

The kf90 command specifies the linker switches -lpthread and -non_shared by default. The -non_shared switch causes the image to be linked with archive libraries instead of with shared libraries. To override the -non_shared default, specify -call_shared on the command line, for example:


kf90 -call_shared myprog.f90 


The kf90 driver accepts either Fortran 90 or Fortran 77 source input.

Like the f90 command, the kf90 command assumes by default that source files with an extension of .f90 are free format, and source files with an extension of .f , .for , or .FOR are fixed format. You can override these defaults by using a format-related switch with either the KAP preprocessor or with the Compaq Fortran compiler. The format-related compiler switches are -free and -fixed . The corresponding KAP preprocessor switches are -freeformat and -nofreeformat .

Table 2-1 lists combinations of switches and file extensions and the resulting assumption KAP makes about the format of the source file.

Table 2-1 kf90 Assumed Source Format Based on Switch Settings and File Extensions
Switches Source File Extension
KAP F90 .f90 .f, .for, .FOR
default default free fixed
-freeformat default free free
-nofreeformat default fixed fixed
default -free free free
default -fixed fixed fixed
-freeformat -fixed KAP issues error message
-nofreeformat -free KAP issues error message

For more information about the -[no]freeformat switch, see Section 4.5.7. For more information about the -free and -fixed switches, see the Compaq Fortran User Manual for Tru64 UNIX and Linux Alpha Systems.

2.5 Compiling a Program Containing C Preprocessor Directives Using kf90

If your Fortran program contains C preprocessor directives and you do not want to use any additional C preprocessor directives in the kf90 command line, use the Compaq Fortran compiler switch -cpp , as follows:


kf90 -cpp myprog.f90 

The -cpp switch causes the C preprocessor to run on your Fortran program before compilation.

In the event you want to use C preprocessor directives in the kf90 command line, you must also include the C preprocessor switch -C to avoid errors resulting from C comment lines. For example, in the following kf90 command line where -Dfoo is a C preprocessor switch, you must include -C , as follows:


kf90 -cpp -C -Dfoo myprog.f90 

The kf90 driver does not set the -C switch when you use C preprocessor directives in the command line.

2.6 Optimized Programs

The kf90 command saves the optimized version of your source program in the current directory for use in debugging and profiling. The default file extension of the optimized source depends on the input file extension, as follows:
File Extension of Input File Extension of Transformed Source
.f90 .cmp.f90
.f, .for, .FOR .cmp.f

The Compaq Fortran compiler uses the file extension of the optimized source file to determine the source format. Compaq Fortran assumes sources with a file extension of .f90 are free format and sources with a file extension of .f , .for , or .FOR are fixed format. You can override the defaults by using the Compaq Fortran compiler switches -free and -fixed. You can override the naming of the optimized program by using the -cmp switch. See the -cmp description in Section 4.9.1.

2.7 KAP Command-Line Switches Determined by Compiler Switches

Some Compaq Fortran compiler switches automatically set KAP command switches or alter the default KAP switch settings.

Explicitly calling the compiler switch -assume=accuracy causes KAP to be called with -roundoff=0 . Otherwise, the KAP command-line switch -roundoff defaults to -roundoff=3 .

Explicitly calling the compiler switch -nof77 causes KAP to be called with the -onetrip command-line switch.

Explicitly calling the compiler switch -noi4 causes KAP to be called with the command-line switches -integer=2 and -logical=2 ; otherwise, the defaults are -integer=4 and -logical=4 .

2.8 Compiling a Program Using kapf90

Use the following command to execute KAP as a standalone preprocessor:


kapf90 [kap_switch_string] myprog.f90 -cmp=myprog.cmp.f90 -freeformat 

The kapf90 command assumes that the source file input is fixed format by default. Use the Compaq KAP Fortran/OpenMP -freeformat switch to cause KAP to treat source files as free format, as shown in the previous code example. For more information about the -freeformat switch, see Section 4.5.7.

After preprocessing your program, give myprog.cmp.f90 to the Compaq Fortran compiler, as follows:


f90 -fast -tune host -non_shared myprog.cmp.f90 

Note

When you use kapf90 to process a file, you must set the Compaq Fortran compiler and linker switches appropriately. For this reason, Compaq recommends that you use kf90 whenever possible, because kf90 automatically sets the compiler and linker switches correctly.

2.9 Compiling a Program Containing C Preprocessor Directives Using kapf90

If a Fortran 90 program contains C preprocessor directives, preprocess it with cpp before you process it with kapf90 . For example, if your program has C include statements, process it as follows:


cpp -P myprog.f > myprog.i 
kapf90 myprog.i -cmp=myprog.f90 
f90 myprog.f90 

2.10 Using KAP Syntax

Specify switches in lowercase with the syntax -switch[=value]. Do not leave spaces between the switch name and the value. Switches can appear before or after the input file as follows:


kapf90 -inm  myprog.f90  -roundoff=2 -freeformat 

KAP recognizes standard abbreviations for switches. Switches that take a list of names must have the names separated by commas and with no spaces, for example:


-inff=besl.f90,util.f90 

Enclose KAP command-line switches passed through kf90 by using the -fkapargs switch with single quotation marks, as follows:


kf90 -fkapargs='-optimize=5  -roundoff=3  -scalaropt=3' -w myprog.f90 

Compaq Fortran compiler switches, for example, -w , do not require quotation marks.

2.11 Using File Naming Conventions

Any input file name is valid. If the file name does not have an extension, the extension .f90 is assumed. As KAP processes a Fortran 90 file it generates three output files --- the optimized program file, the optional listing file, and the executable file.

The default output file names are as follows:

<file>.cmp.f90 --- the optimized Fortran 90 program from the kf90 driver
<file>.cmp --- the optimized Fortran 90 program from kapf90
<file>.out --- the annotated KAP listing file
a.out --- the executable file

Other output file names can be specified with the -cmp and -list switches.

When KAP detects an error condition, KAP writes a message to standard error.

2.12 Guidelines for Optimizing With KAP

This section describes how you can get maximum performance in your application programs in the shortest time.

This information can be used with both multiprocessor and single-processor systems, and with both Fortran and C versions of all KAP products. Therefore, the information may contain references to command-line switches or settings that are unavailable or that are different from those in the KAP that you are using.

This section provides separate protocols for small and large programs. Small programs are defined as those that can be compiled and run quickly. Because the cost of each iteration is small, you can take risks. The information presented here further assumes that small programs have a small number of program units.

Large programs are defined as those that take more time to compile and run than it takes for you to check the results. A program can be large either because the source code is very large or because the execution time is long.

2.12.1 Optimizing Small Programs with KAP

Follow these guidelines to optimize small programs:

2.12.2 Optimizing Large Programs with KAP

Follow these guidelines to optimize large programs:

  1. Compile the program without KAP, with minimum compiler optimization, and with all compiler run-time checks enabled. Note the execution time and verify the results. If the program fails at this step, there is not much optimization you can do.
    Some older programs use standard-violating techniques that KAP will not transform safely. If KAP fails because of this problem, there is little optimization you can do.
    If you have the time and you know what the program is supposed to do, you can try to isolate the incorrect code, correct it, and proceed. This action is feasible for large programs only if the problems are easily understood and isolated or if you have enough time to find more intractable problems.
    If the problem code is isolated and runs without KAP optimization, you may be able to run KAP on the rest of the program and leave out any problematic sections. You can also refer to Section 2.15 on KAP problems. You may be able to diagnose and correct some problems, and then run KAP on your program successfully.
  2. Compile without KAP but with maximum compiler optimization. Note the execution time and verify the results. If the program fails, reduce compiler optimization and try again.
  3. Compile the fastest/best non-KAP run and run it again with profiling enabled (for example, gprof ) to identify the program units that take the most time to run.
    Time-intensive units that have many iterative loops and arrays are good candidates for KAP loop optimizations. Go to step 4.
    If these units are not good candidates, then the lower-payoff optimizations, such as inlining, may provide some performance improvement especially if there are places where inlining inside loop nests may also allow KAP to perform vectorization optimizations. In this case, go to step 6.
  4. If time-intensive routines were identified as good candidates, run KAP on them with modest KAP optimization ( -optimize=2 ), compile the whole program with the other switches used in the best run from step 2, note the execution time, and verify the results.
    If the program fails, try again with the KAP switch -roundoff=0 . If that works, the failure is probably due to roundoff-sensitive operation. If it still fails with -roundoff=0 , try -scalaropt=1 .
  5. If step 4 works, repeat with full KAP optimization, with full compiler optimization, and with -roundoff=0 or -scalaropt=1 , if needed.
    If the program fails, reduce the setting to a lower KAP optimization level or a lower compiler optimization level, and try again. If you have success at this step, you can also try the suggestions found in Section 2.14.
  6. If there are no routines with arrays and loops, run the whole program with -optimize=0 and -inline_and_copy =aaa,bbb,ccc,.., where aaa, bbb, and so forth, are the most frequently called routines from the profiling run in
    step 3.
    If this action succeeds, repeat with the -optimize=4 and
    -inline_and_copy=... switches. If this action fails, try rerunning with -roundoff=0 or -scalaropt=1 or with fewer routines inlined. (See Section 2.15 for an explanation of binary chop.) Also, if you have success at this step, try the suggestions in Section 2.14.

2.12.3 General Optimization Tips

2.13 Improving and Customizing KAP Performance

After you have used the KAP protocol for either small or large programs, you can find ways to fine-tune KAP to fit your application.

This section helps you discover which KAP command-line switches, directives, or assertions can be used to try to improve KAP performance for a particular application program. The following is a list of common goals and common program situations that KAP users often have, and it offers suggestions for possible improvements.

Remember that KAP is a tool to optimize Compaq Fortran code. Like any tool, it performs best when you are familiar with the details of how it works and are able to use its switches correctly and advantageously.

Although KAP default switch settings will achieve performance improvement, you can often achieve greater improvement if you understand and use alternate switch settings. Moreover, you can often insert directives or assertions to achieve improved performance.

See Table 2-2 for user actions and specific goals.

Table 2-2 User Actions for Specific Goals
Goal User Action
Have a more informative listing to help answer your questions. Use -lo=otkl or other listing switches under -listoptions command-line switch.
Recognize more reductions. Increase -roundoff switch setting.
Answer a KAP generated question. Use appropriate assertion.
Eliminate unnecessary last-value assignment. Use !*$* assert no last value needed or -assume without the l switch; or try -save=manual .
Spend less time optimizing deeply nested loops. Reduce -limit and -arclimit or their directives.
Disable inner loop unrolling. Use -unroll=1 or -scalaropt < 2.
Disable outer loop unrolling. Use -roundoff < 3 or -scalaropt < 3.
Prevent a given loop from being optimized. Use !*$* assert do (serial) , !*$* assert do prefer (serial), !*$* noconcurrent , or !*$* optimize (0) . (Remember to reenable optimization after the serial loop.)
Disable some data dependence checking. Use !*$* assert no recurrence for one loop nest.
Expand (inline) subroutine calls within DO loops. Use -inline, -inline_from_files, or -inline_create and
-inline_from_libraries . Or, if the goal is to execute the subroutine body concurrently, try -ipa or !*$* assert concurrent call .
Inline more routines. Increase -inline_depth and
-inline_looplevel . (See also the !*$* inline directive.)
Turn off directives and assertions. Use the -nodirectives switch.
Process a program that uses intentional array bounds violation. Use !*$* assert bounds violations .
Use STATIC storage. Insert SAVE statements or use -save=all_adjust .

2.14 Using Additional Performance Improvement Techniques

After you have successfully run KAP on a working program by using either the protocol for small programs or that for large programs, you can try the following procedures to find additional opportunities for optimization within your program:

2.15 Correcting KAP Problems

The following are some problems you may encounter when using KAP and possible fixes and workarounds:


Previous Next Contents Index