Compaq KAP C/OpenMP
for Tru64 UNIX
User Guide


Previous Contents Index


Chapter 4
KAP Command-Line Switches

This chapter describes Compaq KAP C command-line switches that allow you to alter KAP defaults.

4.1 Overview of Command-Line Switches

You will frequently be satisfied with the default switch settings of Compaq KAP C. However, you can alter default settings to customize optimizations for a given application program and machine. These alterations include limiting the search space for loop optimization, adjusting the parameters that describe cache memory, and enabling or disabling classes of transformations.

To specify a command-line switch, you can use the long name or short name. If a command-line switch appears more than once on the command line, the last value given is used. Multiple occurrences of an input/output file selection switch are not allowed.

Note

The short names for switches are provided as a convenience, especially for interactive users. However, the short names may not remain unique from one version of KAP to another. Use the long names in situations that require long-term compatibility, such as a canned shell script.

Table 4_1 and Table 4-2 list the command-line switches for the kcc driver and the kapc preprocessor. The first column lists the long name of each switch as well as the functional categories of switches, such as general optimization, parallel processing, and so forth. The next two columns list the short name and default value of each switch. Switches that have different argument syntax in their regular and negative (no) forms are shown on two lines.

Note

File names are case sensitive on Tru64 UNIX systems, so file-name parameters must match the names of the files wanted.

A hyphen (-) is required before each switch listed in the following tables, but the hyphen is not shown in the tables.

Table 4-1 Command-Line Switches for the kcc Driver
Long Name Related Switch Short Name Default Value
cc=<C_compiler_path>     cc=/usr/bin/cc
cext=<C file extension>     cext=c
ckap=<path to kapc>     ckap=/usr/bin/kapc
ckapargs=<kap_switch_string>      
cpp=<cpp_path>     cpp=/usr/bin/cc
sif=<cpp, kap>,-S     off
tmpdir=<temporary_directory_path>     tmpdir=/tmp/
tune=<architecture>     tune=<current system architecture>
verbose   v nov

Table 4-2 Command-Line Switches for the kapc Preprocessor
Long Name Related Switch Short Name Default Value
General Optimization      
[no]interchange     interchange
namepartitioning=
<integer>,<integer>
so namepart=
<integer>,<integer>
nonamepartitioning
natural   nat natural
optimize=<integer>   o=<integer> optimize=5
[no]recursion   [n]rc rc
roundoff=<integer> o, so r=<integer> roundoff=3
scalaropt=<integer> r so=<integer> scalaropt=3
skip   sk nosk
tune=<architecture>     tune=<current system architecture>
Parallel Processing      
chunk scheduling   chunk=1
[no]concurrentize   [n]conc noconcurrentize
minconcurrent=<integer>   mc minconcurrent=1000
scheduling=<list>   sched=<list> scheduling=e
Inlining and IPA      
inline[=<names>]   inl[=<names>] off
noinline[=<names>]   ninl[=<names>]  
ipa[=<names>]   ipa[=<names>] off
noipa[=<names>]   nipa[=<names>]  
inline_and_copy=<names>   inlc=<names> off
inline_create=<file>   incr=<file> off
ipa_create=<file>   ipacr=<file> off
inline_depth=<integer>   ind=<integer> ind=2
ipa_depth=<integer>   ipad=<integer> ipad=10
inline_from_files=<file>,<file> inl inff=<file>,<file> current source file
ipa_from_files=<file>,<file> ipa ipaff=<file>,<file> current source file
inline_from_libraries=<library>,
<library>
inl infl=<library>,
<library>
off
ipa_from_libraries=<library>,<library> ipa ipafl=<library>,
<library>
off
inline_looplevel=<integer>   inll=<integer> inll=2
ipa_looplevel=<integer>   ipall=<integer> ipall=2
inline_manual   inm off
ipa_manual   ipam off
inline_optimize=<integer>     inline_optimize=0
ipa_optimize=<integer>     ipa_optimize=0
Input-Output File Selection      
cmp[=<file>]   cmp[=<file>] See Section 4.6.1
nocmp   ncmp  
list[=<file>]   l[=<file>] See Section 4.6.2
nolist   nl  
Listing      
cmpoptions[=<list>]   cp[=<list>] ncp
nocmpoptions   ncp  
lines=<integer>   ln=<integer> ln=55
listingwidth=<integer>   lw=<integer> lw=132
listoptions=<list>   lo=<list> see Section 4.7.4
suppress=<list>   su=<list> off; see Section 4.7.5
Language      
[no]restrict     restrict
signed     See Section 4.8.2
Advanced Optimization      
addressresolution=<integer> so, r arl=<integer> arl=1
[no]arclimit=<integer> so, r arclm=<integer> arclm=5000
cache_prefetch_line_count=
<integer>
  cplc=<integer> cplc=0
cacheline=<integer>[,<integer>]   chl=<integer>
[,<integer>]
chl=64,64
cachesize=<integer>[,<integer>]   chs=<integer>
[,<integer>]
chs=32,0
dpregisters=<integer>   dpr=<integer> dpr=32
each_invariant_if_growth=<integer> so, r, miifg eiifg=<integer> eiifg=20
fpregisters=<integer>   fpr=<integer> fpr=32
[no]fuse so,o [n]fuse nofuse
fuselevel=<integer> fuse   fuselevel=0
heaplimit=<integer>   heap=<integer> heaplimit=100
hoist_loop_invariants=<integer> so, r hli=<integer> hli=1
limit=<integer>   lm=<integer> lm=50
machine=<list> so, r ma=<list> ma=s
max_invariant_if_growth=<integer> so, r, eiifg miifg=<integer> miifg=500
routine=<rtn_name><switches>   rt=<rtn_name>
<switches>
off
setassociativity=
<integer>[,<integer>]
so, r sasc=
<integer>[,<integer>]
sasc=1,1
[no]stdio so, r   nostdio
[no]syntax   sy=<value> nosyntax
tablesize=<integer>   ts=<integer> ts=24000000
unroll=<integer> so, r ur=<integer> ur=4
unroll2=<integer> so, r ur2=<integer> ur2=160
unroll3=<integer> so, r ur3=<integer> ur3=1

4.2 Command-Line Switches for the kcc Driver

The following sections explain the function of each kcc driver switch.

4.2.1 -cc, -nocc, (-cc=/usr/bin/cc)

This switch provides an alternate path to the C compiler or inhibits execution of the C compiler.

4.2.2 -cext, (C file extension)

This switch tells kapc to treat files with the indicated extension as C source files.

4.2.3 -ckap, (-ckap='/usr/bin/kapc')

This switch provides a way to define an alternate path kapc preprocessor (translator).

4.2.4 -ckapargs

The -ckapargs switch passes switches to the kapc translator. This switch must precede switches to the kapc translator.

4.2.5 -cpp, (-cpp='/usr/bin/cc')

This switch provides a way to define an alternate path to the C preprocessor before execution of kapc .

4.2.6 -sif, -S, (off)

Save intermediate files. Specifying -sif is equivalent to -sif=cpp,kap , which will save all kapc and C preprocessor intermediate files. Specifying -S is equivalent to -sif=kap and passing -S to the compiler, which saves the assembly-language output. Intermediate file-naming conventions are as follows:

<file>.cpp - cpp output file
K<file>.c - kapc translator output file

The path and switch strings shown above must be enclosed in single or double quotes if they contain white space characters.

4.2.7 -tmpdir, (-tmpdir=/tmp/)

This is the directory to place temporary files. This switch may also be set by the environment variable TMPDIR.

4.2.8 -tune, (-tune=<current system architecture>)

KAP determines whether the host Alpha architecture is ev4 , ev5 , or ev6 and then optimizes your program for that architecture by default. In the event you compile a program on one architecture but plan to run it on another, you should override the default by setting -tune equal to the architecture of the target system.

The KAP -tune switch and the C compiler -tune host switch work independently and perform different optimizations. If the switch appears on the command line inside -ckapargs='-tune...' , for example:


> kcc myprog.c -ckapargs='-tune=ev6' 

the switch value will be applied only to the KAP translator. However, in the case:


> kcc myprog.c -tune=ev6 

the switch will be applied to both KAP and the C compiler.

4.2.9 -verbose, -v, (-nov)

Prints the passes as they execute with their arguments and their input and output files. Also prints final resource usage in the C-shell time format.

4.3 General Optimization Switches for the kapc Preprocessor

The following sections explain the function of each kapc general optimization switch.

4.3.1 -interchange, -nointerchange, (-interchange)

Use the -interchange switch to enable loop interchanging.

KAP enables loop interchange when -interchange is specified and the -optimize level is at least 1 or the -scalaropt level is 3.

If you specify -nointerchange , KAP disables loop interchange regardless of the -optimize or -scalaropt levels.

4.3.2 -namepartitioning, -namepart, -nonamepart, (-nonamepartitioning)

The -namepartitioning switch tells KAP to look at distinct array names and limit the number of arrays that appear in a loop to avoid cache thrashing. That is, this switch breaks a loop containing, for example, references to arrays A and B into two loops. One loop references array A and the other loop references array B.

Two arguments ( i and j ) used in a -namepartitioning=i,j switch, control name partitioning as follows:

If no arguments appear with the -namepartitioning switch, KAP uses its default values of 2 for the minimum and 8 for the maximum number of partitions.

Before KAP can perform name partitioning, you must specify the switch -scalaropt=n where n is greater than or equal to 3.

The -nonamepartitioning switch explicitly prevents name partitioning.

4.3.3 -natural, -nat, -nonatural, -nnat, (-natural)

The -natural switch selects "natural" alignment (for example, double entities start on eight-byte boundaries) instead of non-alignment of data elements.

The -natural switch causes variables and arrays to start on boundaries that correspond to their size.

4.3.4 -optimize, -o, (-o=5)

The -optimize switch sets the optimization level, ranging from the least aggressive optimization of 0 to the most aggressive of 5.

Each optimization level is cumulative. For example, -optimize=5 performs everything up to and including that level. Table 4-3 shows the meaning of each of the different optimization levels.

Table 4-3 Optimization Levels
Value Meaning
0 KAP performs only simple program analysis. No loop optimization is performed.
1 KAP performs simple loop optimization. KAP can distribute loops to optimize only a part of a loop.
2 KAP optimizes any loop (and perhaps nested loops) in a loop nest. It performs lifetime analysis to determine when last-value assignment of scalars is necessary. It performs more powerful data dependence tests to find opportunities for optimization.
3 Special techniques are used to break data dependence cycles that otherwise prevent advanced optimizations. Triangular loops are recognized and loop interchanging is performed to improve memory referencing. Special-case data dependence tests are used.
4 Two versions of a loop are generated, if necessary, to break a data dependence arc. Exact data dependence tests are used to allow more opportunities for optimization to be discovered. Special index sets, called wraparound variables, are recognized.
5 Array expansion and loop fusion are enabled.

A higher optimization level allows more sophisticated optimization, along with increased compilation time. Many programs that are written to be easily optimized do not need advanced transformations; with these programs, a lower optimization level will suffice.

4.3.5 -recursion, -rc, -nrc, (-norecursion)

The -recursion switch informs KAP that functions in the source program may be called recursively. (That is, the function calls itself, or it calls another routine that calls it.)

The -recursion switch must be in force in each recursive routine that KAP processes, or unsafe transformations could result.

4.3.6 -roundoff, -r, (-r=3)

The -roundoff switch lets you specify the level of acceptable roundoff errors.

If an arithmetic reduction is accumulated in a different order than in the scalar program, the roundoff error is accumulated differently and the final result may differ from that of the original program. The difference is usually insignificant, but some restructuring transformations performed by KAP must be disabled in order to obtain exactly the same answers as the scalar program.

KAP classifies its transformations by the amount of difference in roundoff that can accumulate, so you can decide what level of roundoff error differences is allowable.

Each nonzero roundoff level is cumulative. For example, level 3 performs everything up to and including that level. Table 4-4 shows the meaning of each roundoff level.

Table 4-4 Roundoff Levels
Value Meaning
0 No roundoff-changing transformations are allowed. Loops containing nonarithmetic reductions (such as the largest element of a vector) may still be optimized.
1 Loop interchanging around serial reductions is allowed if -optimize=4 . Simplification of expressions from forward substitution or inside trigonometric intrinsic functions returning integer values is performed. Code floating is enabled if -scalaropt is greater than or equal to 2. Loop rerolling is enabled if -scalaropt is greater than or equal to 2.
2 Reciprocal substitution is performed to move an expensive division outside a loop.
3 Floating-point (float or double) induction variables are recognized. Memory management is enabled if -scalaropt=3 . Expressions such as A / B / C can be rotated to A / (B * C).

4.3.7 -scalaropt, -so, (-so=3)

The -scalaropt switch sets the level of scalar optimizations that KAP performs. These scalar optimizations include dusty-deck transformations, dead-code elimination, and loop unrolling.

Table 4-5 shows the value and meaning of scalar levels.

Table 4-5 Scalar Levels
Value Meaning
0 No scalar optimizations are performed.
1 Only simple scalar optimizations are performed. These include dead-code elimination, global forward substitution, and dusty-deck IF transformations.
2 The full range of scalar optimization is performed. These include floating invariant IFs out of loops, induction variable recognition, loop rerolling if -roundoff is greater than or equal to 1, loop peeling, loop fusion, and loop unrolling.
3 Memory management is enabled if -roundoff=3 .

Unlike the -scalaropt switch, the #pragma _KAP scalaropt directive sets the level of loop-based optimizations only, such as unrolling, and not optimizations such as dead code elimination.

4.3.8 -skip, -sk, -nsk, (-noskip)

The -skip switch tells KAP to ignore application of optimizing transformations for all routines within the input file. If you want to be selective in terms of which routines are not optimized, see the description of the -routine switch in Section 4.9.16, -routine=ƒrtn_name„ƒswitches„, -rt=ƒrtn_name„ƒswitches„, (off).

4.3.9 -tune, (-tune=<current system architecture>)

kapc determines whether the host Alpha architecture is ev4 , ev5 , or ev6 and then optimizes your program for that architecture by default. In the event you compile a program on one architecture but plan to run it on another, you should override the default by setting -tune equal to the architecture of the target system.

The kapc -tune switch and the C compiler -tune host switch work independently and perform different optimizations. If the switch appears on the command line inside -ckapargs='-tune...' , for example:


> kcc myprog.c -ckapargs='-tune=ev6' 

the switch value will be applied only to the kapc translator. However, in the case:


> kcc myprog.c -tune=ev6 

the switch will be applied to both kapc and the C compiler.

4.4 Parallel Processing Switches for the kapc Preprocessor

The following sections describe the kapc switches you use to control how the multiprocessor version of KAP prepares programs for parallel execution.

4.4.1 -chunk, (-chunk=1)

The -chunk switch modifies, and is used only with, the -scheduling switch. The -chunk switch determines the number of loop iterations that are in a group.

4.4.2 -concurrentize, -conc, -noconcurrentize, (-nconc)

The -concurrentize switch directs KAP to restructure the source code for parallel processing. You can enable or disable parallel execution on a file-by-file basis using KAP pragmas. See Section 5.2, Parallel Processing Assertions for more information.

Parallel execution will disable certain serial optimizations. Programs containing many loops that require synchronization or programs that have loops with small iteration counts might run more slowly when parallelized. In these cases, you should disable parallel execution.

Setting -noconcurrentize disables parallel execution and allows all serial optimizations to take place.

4.4.3 -minconcurrent, -mc, (-mc=1000)

The -minconcurrent switch sets the level of work in a loop above which KAP executes the loop in parallel. The range of values for this switch is all numbers greater than or equal to 0. The higher the minconcurrent value, the more iterations and/or statements the loop body must have to run in parallel.

Executing a loop in parallel incurs overhead that varies with different systems. If a loop has little work, the overhead required to set up parallel execution might make the loop execute more slowly than it would using serial execution.

KAP estimates the amount of work inside a loop by adding the number of operators and the number of operands, excluding the loop index, in each iteration. KAP multiplies this sum by the number of iterations and designates this product as the amount of "work" of the loop. KAP then compares this estimate with the -minconcurrent value. If the loop bounds are constant and the estimated amount of work is greater than the -minconcurrent value, KAP generates parallel code for the loop. Otherwise, the loop executes serially.

If the for loop bounds are not known at compilation time, KAP generates an if expression in the parallel pragma. The compiler interprets this parallel pragma as a request to generate a two-version loop; one version is parallel and the other is serial. A run-time check decides whether or not to execute the loop in parallel. To disable the generation of two-version loops throughout a program, use the command-line switch -minconcurrent=0 .

Setting the -minconcurrent switch automatically sets the -concurrentize switch.

4.4.4 -scheduling, -sched, (-sched=e)

The -scheduling switch tells KAP the kind of scheduling to use for loop iterations on a multiprocessor system.

The options are:

4.5 Inlining and Interprocedural Analysis (IPA) Switches for the kapc Preprocessor

The following sections explain the function of each kapc switch used in function inlining and interprocedural analysis (IPA). Inlining is the process of replacing a function reference with the text of the function. IPA is the process of inspecting a called function to identify relationships between the function arguments, the function returned value, global data, and the code surrounding the call, in order to identify opportunities for optimization.

Inlining and IPA can be performed in the same KAP run. The only restriction is that the same function may not be in global lists for both inlining and IPA. You can use the inline and IPA pragmas to inline a function in one place and IPA it in another. For additional information about these switches and examples of their use, see Chapter 5 and Chapter 6.

4.5.1 -inline, -inl, -noinline, (-ninl), -ipa, -noipa, (-nipa)

The -inline switch provides KAP with a list of functions to inline. The -ipa switch provides KAP with a list of functions to analyze. Additionally, -ipa causes KAP to give information in the annotated listing about appropriate settings for the -ind , -inll , and -ipall switches on a loop-by-loop basis.

If you specify either the -inline or the -ipa switch without an argument list, KAP will try to inline/analyze all the called functions in the inlining (or IPA) universe specified by the -inline_from... -ipa_from... switches. If you specify a list of routine names, for example -inline=mkcoef,yval , just the routines named are inlined or analyzed.

The -inline and -ipa command-line switches are overridden by the
#pragma _KAP inline and #pragma _KAP ipa directives. See Chapter 5, Assertions and Directives and Chapter 6, Inlining and Interprocedural Analysis (IPA) for more information about these pragmas.

A list of routines must be included with -noinline or -noipa . All routines in the inlining/IPA universe are candidates for inlining except the listed ones. See Chapter 6 for more information.

4.5.2 -inline_and_copy, -inlc, (off)

The -inline_and_copy switch functions like the -inline switch, except that if all CALLS and references to a subprogram are inlined, the text of the routine is not optimized but is copied unchanged to the transformed code file. This switch is intended for use when you are inlining routines from the same file as the call, and has no special effect when the routines being inlined are taken from a library or another source file.

When a subprogram has been inlined everywhere it is used, leaving it unoptimized saves compilation time. When a program involves multiple source files, the unoptimized routine will still be available in case one of the other source files contains a reference to it.

Note

The -inline_and_copy algorithm assumes that all CALLs and references to the routine precede it in the source file. If the routine is referenced after the text of the routine and if that particular call site cannot be inlined, the unoptimized version of the routine will be invoked.

4.5.3 -inline_create, -incr, (off) -ipa_create, -ipacr, (off)

These switches cause KAP to build a library file containing partially analyzed routines for later inlining/analysis. The library created is used with the -inline_from_libraries and -ipa_from_libraries switches.

When you specify either of these switches, no transformed code file is generated.

Libraries created with -inline_create can be used with either inlining or IPA, because they contain essentially complete descriptions of the functions included. Libraries created with -ipa_create can be used only with IPA, because they do not have the complete text of the functions, just the data relationship information.

You can use any name for the created library. However, for maximum compatibility with the -inline_from_libraries and -ipa_from_libraries switches, Compaq recommends that you use the .klib extension.

4.5.4 -inline_depth, -ind, (-ind=2), -ipa_depth, -ipad, (-ipad=10)

The -inline_depth and -ipa_depth switches set the maximum level of function nesting, that is, calls to functions with calls to functions and so forth, that KAP will attempt to inline or analyze. Higher switch values cause KAP to trace CALLs and function references further. The values and their meanings are:

The #pragma _KAP [no]inline and #pragma _KAP [no]ipa directives are not affected by -inline_depth or -ipa_depth restrictions.

4.5.5 -inline_from_files, -inff, (current source file)

The -..._from_... switches provide KAP with the locations of functions available for inlining/IPA. The total set of available functions is called the inlining (or IPA) universe.

The -..._from_files switches take the names of source files and directories containing source files. Including a directory, for example, -ipaff=/usr/ipalib , is equivalent to the UNIX notation /usr/ipalib/*.c . Do not use shell wildcard characters in the list of files and directories.

The -..._from_libraries switches take the names of libraries created with the -..._create switches and directories containing such libraries. In directories, the KAP libraries are identified by the .klib extension.

Multiple files/libraries or directories can be given in one -..._from_... switch, separated by commas and enclosed by parentheses. Multiple -..._from_... switches can be specified on the command line.

KAP searches for functions in the provided files and libraries in the order in which they appear on the command line.

4.5.6 -inline_from_libraries, -infl, (off)

See Section 4.5.5, -inline_from_files, -inff, (current source file).

4.5.7 -ipa_from_files, -ipaff, (current source file)

See Section 4.5.5, -inline_from_files, -inff, (current source file).

4.5.8 -ipa_from_libraries, -ipafl, (off)

See Section 4.5.5, -inline_from_files, -inff, (current source file).

4.5.9 -inline_looplevel, -inll, (-inll=2), -ipa_looplevel, -ipall, (-ipall=2)

The -..._looplevel switches enable you to limit inlining to just functions that are referenced in nested loops, where the effects of reduced function call overhead or enhanced optimizations will be multiplied.

The parameter is defined from the most deeply nested function reference. The -inll=1 switch restricts inlining to functions referenced in the deepest loop nest. The -inll=3 switch restricts inlining to those routines referenced at the three deepest levels. The for loop nest level of each function reference is included in the optional calling tree section of the listing file.

The #pragma _KAP [no]inline and #pragma _KAP [no]ipa directives, when enabled, are not affected by the looplevel restrictions.

4.5.10 -inline_manual, -inm, (off) -ipa_manual, -ipam, (off)

These switches cause KAP to recognize the #pragma _KAP [no]inline and #pragma _KAP [no]ipa directives. This allows manual control over which functions are inlined/analyzed at specific call sites.

The default is to ignore these pragmas. When any inlining or IPA switch is included on the command line, the inline or ipa pragmas, respectively, are enabled. The -inline_manual and -ipa_manual switches are provided so the pragmas can be enabled without activating the automatic inlining or IPA algorithms. Because #pragma _KAP [no]inline and #pragma _KAP [no]ipa are not otherwise affected by the -inline=, -ipa=, -inline_depth, and -.._looplevel command-line switches, you can use them with command-line control to select functions or call sites that the regular selection algorithm would reject.

See Chapter 5, Assertions and Directives and Chapter 6, Inlining and Interprocedural Analysis (IPA) for more information about the inline and ipa pragmas.

4.5.11 -inline_optimize, (-inline_optimize=0), -ipa_optimize, (-ipa_optimize=0)

The switches -inline_optimize and -ipa_optimize help you to optimize large programs by causing KAP to set other switches depending on the value you specify. The values and meanings are:

4.6 Input-Output File Selection Switches for the kapc Preprocessor

The following sections explain the function of each kapc switch that affects KAP input-output file selection.

4.6.1 -cmp, -nocmp, -ncmp, (<file>.cmp.c), (<file>.cmp)

The -cmp=<file> switch lets you assign a different file name for the optimized C program.

The Compaq C compiler will only process files with the extension .c. Thus, you should not override the default by using any other extension. Note that the kcc command will create the default name <file>.cmp.c while explicit user invocation of the kapc command will create the default name <file>.cmp.

The optimized source file is placed in the current directory.

To disable generation of the optimized C output file, enter -nocmp on the command line.

4.6.2 -list, -l, -nolist, -nl, (-list=<file>.out)

The -list=<filename> switch provides a way to name the generated annotated listing file.

Specifying -list with no file name will cause the listing file to be written to <file>.out , where <file> is the input file name with any trailing .c stripped off. For example, if the input file is myprog.c , the output file would be myprog.out .

To disable generation of the listing file, enter -nolist on the command line.

4.7 Listing Switches for the kapc Preprocessor

The following sections explain the function of each kapc switch concerning the listing file or the optional listing information available in the transformed code file.

The transformed code is recorded in the transformed code file regardless of whether you request a listing file.

See Chapter 8 for examples of the different types of KAP listing output.

4.7.1 -cmpoptions, -cp, -nocmpoptions, (-ncp)

The -cmpoptions switch specifies optional additional information for inclusion in the transformed output file. The only additional information currently selectable is special line-number comments. These are enabled with -cmpoptions=i , which inserts special numbers that reference original code.

Special line numbers are # line directives that may appear in the transformed program file in order to reference line numbers of the source code. The line in the transformed code that immediately follows a # line comment is either the transformed version of the line in the source code that is referenced, or a line that KAP inserted before the referenced line. The name of the source file from the command line is included, in the form it had on the KAP command line.

In the following unrolled loop, the for in the source code was on line 7 , and the assignment was on line 8 :


# line  7  "-./csource/unr5.c"
      for ( i = i1 + 1; i<=n; i+=3 ) { 
      a[i] = b[i] / a[i-1]; 
# line  8  "-./csource/unr5.c" 
      a[i+1] = b[i+1] / a[i]; 
# line  8  "-./csource/unr5.c" 
      a[i+2] = b[i+2] / a[i+1]; 
# line  8  "-./csource/unr5.c"
                } 

4.7.2 -lines, -ln, (-ln=55)

The -lines switch paginates and sets the number of lines per page for printing.

The -lines=0 switch tells KAP to paginate only at subroutine boundaries.

4.7.3 -listingwidth, -lw, (-lw=132)

The -listingwidth switch sets the maximum line length for the listing file.

This switch setting affects the format of the loop summary table, which is printed with the -listoptions=l switch, and the KAP switches table ( -lo=k ).

The default value, 132, is optimal for most line printers. The alternative, 80, is more convenient for looking at the listing file on most terminals. No other values are allowed.

4.7.4 -listoptions, -lo, (off)

The -listoptions switch tells KAP what information to include in the listing files:
Value Prints
c Calling tree at the end of the program listing
k KAP switches active within the program unit
l Loop-by-loop optimization table
n Program unit names, as processed, to the error file
p Compilation performance statistics
s Summary of the optimizations performed

4.7.5 -suppress, -su, (off)

The -suppress switch tells KAP C what diagnostic information to suppress:
Value Effect
e Suppresses error messages
w Suppresses warning messages

4.8 Language Switches for the kapc Preprocessor

This section provides information about kapc language switches.

4.8.1 -restrict, -norestrict, (-restrict)

The -restrict switch allows KAP to parse the C programming language qualifiers restrict and _restrict . This language feature can help KAP better optimize loops that contain subscripted objects.

The -norestrict switch disables parsing of the restrict and _restrict qualifiers.

4.8.2 -signed, (on)

The -signed switch changes char symbols to signed char . This switch is sometimes necessary when porting code from other platforms whose C compiler defaults char to signed char .

4.9 Advanced Optimization Switches for the kapc Preprocessor

These kapc switches control, or provide information for, transformations that are machine-specific or program-specific. They are provided to allow the advanced user to experiment with obtaining the maximum optimization of a specific application code.

Some of these switches set parameters that KAP uses to optimize memory hierarchy usage.

Knowing how much data can be kept in fast memory (cache or arithmetic registers), and the costs of moving data in the memory hierarchy, enable better optimization of memory reference patterns. The -scalaropt=3 and -roundoff=3 switches are required for memory management to be enabled.

4.9.1 -addressresolution, -arl, (-arl=1)

The -addressresolution switch tells KAP what level of data aliasing might be present in the program. Data aliasing is the use of multiple names for the same memory location. When there might be multiple ways for the same variable to be referenced, KAP is more cautious about transforming the code in ways that might change the order in which variables and arrays are used.

The associated pragma #pragma _KAP arl(n) has the same meaning. The switch is equivalent to a pragma at the beginning of the file, and is thus overridden by other pragmas later in the file.

The meanings of the individual levels are:

4.9.2 -arclimit, -arclm, -noarclimit, (-arclimit=5000)

The -arclimit switch sets the size of the dependence arc data structure that KAP uses to perform data dependence analysis. This data structure is dynamically allocated on a loop-nest-by-loop-nest basis. See Appendix A, Data-Dependence Analysis for a description of data-dependence analysis.

The formula that you use to estimate the number of dependence arcs for a given loop nest is as follows:


dependence_array_size = max (#_of_statements * 4,  arclimit value) 

This is an estimate because KAP is assuming that each statement, in the worst case, would have four dependence arcs .

If a loop contains too many dependence relationships and cannot be represented in the dependence data structure, KAP will give up optimization of the loop.

When the Loop Information Table is included in the listing file
( -listoptions=l ), any loop that was too complex for the dependence data structure to hold the information will be marked as too many stmts/DD arcs . Increasing the -arclimit value may enable KAP to optimize the loop. If -arclimit is already at its maximum value, you can try simplifying the loop or dividing it into smaller loops.

The maximum -arclimit value allowed is 5000. If you specify a value greater than 5000, KAP will default to 5000 in its allocation of the data-dependence array.

Note

Most users do NOT need to change this value.

4.9.3 -cache_prefetch_line_count, -cplc, (-cplc=0)

The -cache_prefetch_line_count gives the number of additional lines prefetched into the cache during a cache miss.

4.9.4 -cacheline, -chl, (-chl=64,64)

The -cacheline switch tells KAP the width of the memory channel in bytes between cache and main memory.

When two arguments are specified, the first argument gives the width of the memory channel between the primary cache and the secondary cache, and the second argument gives the width of the memory channel between the secondary cache and main memory. Omitting the second argument, or specifying it as 64 (the default), tells KAP not to optimize secondary cache usage.

4.9.5 -cachesize, -chs, (-chs=32,0)

The -cachesize switch tells KAP the size in kilobytes of the cache memory.

When two arguments are specified, the first argument gives the size of the primary cache, and the second argument gives the size of the secondary cache. Omitting the second argument, or specifying it as 0 (the default), tells KAP not to optimize secondary cache usage.

The default values depend on the -tune switch and the Alpha architecture of the system. When -tune=ev6 , the default values for -chs are 32,0.

4.9.6 -dpregisters, -dpr, (-dpr=32)

The -dpregisters switch specifies the number of double-precision registers each processor has.

4.9.7 -each_invariant_if_growth, -eiifg, (-eiifg=20)

When a loop contains an if statement whose condition does not change from one iteration to another, the same test must be repeated for every iteration. The code can often be made more efficient by "floating" the if outside the loop and putting the then and else sections into their own loops.

This gets more complicated when there is other code in the loop, because a copy of it must be included in both the then and else loops, as shown in the following example:


for ( i = ...) { 
          section-1 
              if ( ) { 
                 section-2 
              } 
          else 
               { 
                 section-3 
                } 
                  section-4 
                 } 

Becomes:


      if  ( ) { 
                for ( i = ...) { 
                      section-1 
                      section-2 
                      section-4 
                    } 
                } 
               else 
                { 
                 for ( i = ...) { 
                      section-1 
                      section-3 
                      section-4 
                     } 
              } 

When sections 1 and 4 are large, the extra code generated can slow a program down through cache contention, extra paging, and so on, more than the reduced number of if tests speed it up.

The -each_invariant_if_growth switch provides a maximum number of lines of executable code of sections 1 and 4 below which KAP tries to float an invariant if outside a loop.

The total amount of additional code generated in a program unit through invariant- if floating can be limited with the -max_invariant_if_growth switch.

The allowed values for the -each_invariant_if_growth switch are 0 to 5000.

4.9.8 -fpregisters, -fpr, (-fpr=32)

The -fpregisters switch specifies the number of single-precision registers, such as ordinary floating point, that each processor has.

4.9.9 -fuse, (-nofuse)

The -fuse switch tells KAP to perform loop fusion.

Loop fusion is a conventional compiler optimization that transforms two adjacent loops into a single loop. Data dependence tests allow fusion of more loops than standard techniques allow.

Before KAP can perform loop fusion, you must specify the -scalaropt=2 or -optimize=5 switch.

4.9.10 -fuselevel, (-fuselevel=0)

The fuselevel option further controls the level of loop fusion. Whenever you set -fuselevel , KAP automatically sets -fuse .

The possible settings for this option are:

4.9.11 -heaplimit, -heap, (-heaplimit=100)

KAP may require large amounts of memory in order to process your source code. The -heaplimit option specifies the maximum size in megabytes that the KAP heap can grow. If this limit is reached, KAP will stop processing your source code and try to exit with an "out of memory" error message.

If you choose a -heaplimit setting that is greater than the amount of memory that your system has available, KAP may run out of memory before it reaches the -heaplimit .

KAP relies on the operating system to warn when the process is about to run out of memory before the problem occurs. Using -heaplimit makes a graceful exit more likely.

4.9.12 -hoist_loop_invariants, -hli, (-hli=1)

The -hoist_loop_invariants switch controls code hoisting of loop-invariant expressions from loops. Note that this switch is independent of the switches -each_invariant_if_growth and -max_invariant_if_growth, which control the floating of invariant-IFs out of loops.

The possible settings for this switch are:

4.9.13 -limit, -lm, (-lm=50)

In order to reduce compile time, KAP estimates how long it spends analyzing each loop nest construct. If a loop is too deeply nested, KAP ignores the outer loop and recursively visits the inner loops. The -limit switch is a rough dial to control what KAP thinks is too deeply nested.

Larger loop nest limits might allow more optimization for deeply nested loop structures, but might take more compilation time. The limit does not correspond to the for loop nest level; rather, it is an estimate of the number of loop orderings that can be generated from the loop nest. The -limit switch resets this internal limit.

Note

Most users do NOT need to change this value.

4.9.14 -machine, -ma, -nomachine, -nma, (-ma=s)

The -machine switch lets you set characteristics for the system on which KAP output runs.

Use any combination of the following switch settings, except do not specify switches s and n simultaneously:
s Tells KAP to prefer optimization of a for loop that generates stride-1 (contiguous) references over one that generates non-stride-1 operands. Some computers perform better if consecutive references are contiguous in memory.
n Tells KAP to prefer optimization of a for loop that generates non-stride-1 array access over stride-1 array access.

This is suitable for machine architectures that have special interleaved memory hardware where non-stride-1 array access provides the best performance.

o Tells KAP not to parallelize innermost loops when optimizing but to parallelize only outermost loops.

This capability is available to prevent parallelization of applications with small inner loop bounds, thereby reducing overhead costs. When the loop bounds are unknown at compile time, KAP might generate parallel concurrent code for innermost loops, a practice that might be inefficient for the actual loop bounds.

To disable all the switches, enter -nomachine on the command line.

4.9.15 -max_invariant_if_growth, -miifg, (-miifg=500)

When a loop contains an if statement whose condition does not change from one iteration to another (loop-invariant), the same test must be repeated for every iteration. The code can often be made more efficient by floating the if outside the loop and putting the then and else sections into their own loops.

This gets more complicated when there is other code in the loop, because a copy of it must be included in both the then and else loops. The -max_invariant_if_growth switch allows you to limit the total number of additional lines of code generated in each program unit through "invariant-if" restructuring.

The allowed values for the -max_invariant_if_growth switch are 0 to 50000.

4.9.16 -routine=<rtn_name><switches>, -rt=<rtn_name><switches>, (off)

The -routine switch allows you to specify other switches that apply to specific routines within the source file that KAP processes. The only switches that -routine can specify are as follows:

-each_invariant_if_growth
-max_invariant_if_growth
-optimize
-roundoff
-scalaropt
-skip
-unroll
-unroll2
-unroll3

For example, the command to exclude KAP optimizations for routine sub1 of myprog.c is:


> kcc myprog.c -ckapargs='-routine=sub1 -skip' 

The syntax of a KAP command with the -routine switch requires that -routine and the switches it specifies come at the end of the command line after the C source file, for example:


kapc [-<switches>] source_file.c 
-routine=<rtn_name>[,<rtn_name>...] 
-<switches_for_rtn_names> 
... 

Note

If the -routine switch and the switches it specifies are not at the end of the command line after the source file, KAP generates the following error message:


Command line error -- An input file has not been specified on the 
command line. 
KAP -- Command Line Syntax Error Detected. 

You can specify switches that apply to all routines in the source file after kcc or kapc . Of course, <rtn_name> must be a routine in source_file.c . Finally, switches for each instance of <rtn_name> must come from the preceding bulleted list. In particular, the -skip does not process the associated routine.

For example, consider the following command line:


kapc -scalaropt program.c -routine=sub_1 -roundoff=2 -optimize=3 

This command invokes KAP and passes the -scalaropt switch to all program units in file program.c including sub_1 . Furthermore, program unit sub_1 is processed with both the -roundoff and -optimize switches.

Using the -routine switch implies that directives equivalent to the specified switches are asserted only while processing particular routines. The effect is the same as if the implied directives were inserted at the top of the associated routines.

Using the -routine switch also makes the resulting KAP command contain two halves. The first half looks like any other KAP command because it contains KAP switches different from -routine and a source file name. The second half is different because it contains one or more -routine switches, each with associated routines and switches for the routines selected from the preceding bulleted list.

For example, consider the following command line:


kapc -cachesize=8,0 -syntax=a my_program.c \
-routine=sub_1,sub_2,sub_3 -roundoff=2 -optimize=3 -routine=sub_4 -unroll 

Next is an explanation of the two halves:

  1. This command invokes KAP and passes the -cachesize=8,0 and -syntax=a switches to all program units in file my_program.c . The program units include sub_1, sub_2, sub_3, and sub_4 .
  2. Program units sub_1, sub_2, and sub_3 are processed with both the -roundoff and -optimize switches, while routine -sub_4 is processed with the -unroll switch. Of course, the three switches -roundoff , -optimize , and -unroll are in the preceding bulleted list.

Finally, the usual rules for shortening the names of switches also apply to the -routine switch. For example, the following KAP command fragments produce identical results:

-routine=subroutine_a -optimize=3 -unroll=4

4.9.17 -setassociativity, -sasc, (-sasc=1,1)

The -setassociativity switch provides information on the mapping of physical addresses in main memory to cache pages in the Level 1 and Level 2 cache.

The first integer describes the set associativity of the Level 1 cache. The second integer describes the set associativity of the Level 2 cache.

A setting of n means that a page can appear in any of n places in the cache. For instance, a setting of 1 means that a page in main memory can be placed in only one place on the cache. If the cache page is already in use, its contents will have to be rewritten or flushed in order to copy the newly accessed page into the cache.

4.9.18 -stdio, (-nostdio)

The -stdio switch enables strength reduction of certain functions in the stdio.h header file and requires -scalaropt=3 . Programs that call functions such as printf or fput extensively will see improved I/O performance with this switch.

4.9.19 -syntax, -sy, (-nosyntax)

The -syntax switch lets you select the dialect of C that KAP will accept. The settings are:

a --- Checks for strict compliance with ANSI standard C. Extensions are flagged with warning messages.
d --- Specifies Compaq C.
k --- Accepts Kernighan & Ritchie C.

Note: -nosyntax implies the default C language dialect of Compaq C (that is, -sy=d ).

The -standard compiler switch settings affect the -syntax switch settings as follows:

4.9.20 -tablesize, -ts, (-ts=24000000)

The value specified in the -tablesize switch is compared to the mathematical product of the number of statements and the number of variables referenced in a given program unit. When the product is greater than the tablesize value, a "program-too-large" message is issued stating the required tablesize.

Note that you should review your process resource limits with the limit command before adjusting the -tablesize switch. Use the C shell command unlimit or, for example, a command such as limit stacksize 32768 to increase all, or specific, resource limits.

4.9.21 -unroll, -ur, (-ur=4), -unroll2, -ur2, (-ur2=160), -unroll3, -ur3, (-ur3=1)

The -unroll, -unroll2, and -unroll3 switches control how KAP unrolls inner loops.

Loop execution is often more efficient when the loops are unrolled. KAP unrolls the loop until either the loop has been unrolled the number of times given in the -unroll switch, or the amount of "work" in each iteration reaches the value given by the -unroll2 switch.

The switch -ur=0 means to use default values to unroll.

The switch -ur=1 means no unrolling.

The unroll2=n switch sets the upper limit for unrolling. If the estimate of work is greater than n, then the loop will not be unrolled.

The default, n=160 , means a maximum work of 160 in an unrolled iteration. It means that a work of 150 also results in an unrolled iteration while a work of 170 results in no unrolling.

Work is estimated by counting operands and operators in a loop. The amount of work in each loop iteration is shown in the loop table in the annotated listing.

The unroll3=n switch sets the lower limit for unrolling. If the estimate of work is less than n, then the loop will not be unrolled.

The default, n=1 , means a minimum work of 1 in an unrolled iteration. If you choose a higher value, such as 20, it would mean that a work of 30 also results in an unrolled iteration while a work of 10 results in no unrolling.

The -scalaropt=2 switch is required to enable loop unrolling.

Note

If you use kapc with the Compaq C compiler optimization switch set to -O5 , you should turn off loop unrolling by setting -unroll=1 .

Outer loop unrolling is a part of memory management and is not controlled by these switches.

There are two ways to control loop unrolling. The first is to set the maximum number of iterations that can be unrolled; the second is to set the maximum amount of work to be done in an unrolled iteration. KAP will unroll as many iterations as possible while keeping within both these limits, up to a maximum of 100 iterations. No warning is given if you request more than 100 unrolled iterations.

By increasing or decreasing the maximum iteration workload, you can control the amount of work that ends up in each loop iteration, as long as the number of unrolled iterations does not exceed the unroll limit. The workload is estimated by adding operations, including subscripts and assignments; scalars, not including the loop index; and if statements. Loops with function calls are weighted more heavily and are never unrolled. The following example demonstrates the workload limit. Assume that -unroll=3 and -unroll2=24 are the switch settings.


for ( i=0; i<n; i++ ) { 
      a[i] = b[i]+c[i]; 
         } 

The amount of work in this loop is 5. By default, the loop would be unrolled three times, because that is the maximum allowed by the unroll limit, and the resulting weight (3X5) is less than the unroll2 limit of 24.

If you set the -unroll2 limit to 10, the loop would be unrolled twice because unrolling the original loop three times would produce a loop with workload of 15, which would exceed the -unroll2 limit. The result would be the following:


for ( i = 0; i<=n - 2; i+=2 ) { 
        a[i] = b[i] + c[i]; 
        a[i+1] = b[i+1] + c[i+1]; 
         } 
for ( ; i<n; i++ ) { 
         a[i] = b[i] + c[i]; 
        } 

The unroll3=n switch sets the lower limit for unrolling. If there are less than n units of work in the loop (same units as -unroll2 ), the loop will not be unrolled. The amount of work in each loop iteration is shown in the loop table in the annotated listing. This switch value should be left at 1, the default. A value less than the default could result in a program that executes more slowly.


Previous Next Contents Index