Previous | Contents | Index |
This chapter presents additional information about the KAP command switches and inline directives used to inline subroutines and functions, or to perform Interprocedural Analysis (IPA).
Inlining is the process of replacing a subroutine CALL or function reference with the text of the routine. This eliminates the overhead of the call, and can assist other optimizations by making relationships between arguments, returned values, and the surrounding code easier to find.
IPA is the process of inspecting called subroutines and functions for information on relationships between arguments, returned values, and global data. IPA can provide many of the benefits of inlining, but without replacing the CALL or function reference.
The rest of this chapter covers the inlining and IPA command-line
switches and directives, related command switches, examples of their
use, and information about program constructs that inhibit inlining.
Inlining and IPA are symmetrical from the command-line standpoint ---
there are parallel sets of commands and directives for them. In many
places in this chapter the term "inlining" applies to both
inlining and IPA.
7.1 Inlining and IPA Command-Line Switches
There are two phases to inlining --- defining the universe of inlinable routines and selecting which routines in that universe to inline or analyze. The -inline_from... and -ipa_from... switches define the universe of inlinable routines. The -inline, -ipa, -..._looplevel, and -inline_depth switches select which of the available routines are to be inlined/analyzed. The -inline_create and -ipa_create switches set up collections of routines for inclusion in later KAP runs.
All of the inlining and IPA command switches are listed in the following sections. The short forms of their names are in brackets.
Many of these switches have arguments that are lists of routine names or file names. The elements of these lists may be separated by either commas or colons. Multiple element lists must be enclosed in parentheses. |
There are four switches, as follows:
-inline_from_files=<list> [-inff] -inline_from_libraries=<list> [-infl] -ipa_from_files=<list> [-ipaff] -ipa_from_libraries=<list> [-ipafl] |
Where <list> is one or more of the following: source file name, library file name, and directory, separated by commas. The default is current source file.
You can distinguish different types of files by their extensions. For example, -inline_from_files=xj.f,yy.f,../mrtn would look for routines in the Fortran 90 source files xj.f and yy.f , and in Fortran 90 source files in the directory ../mrtn . Including the directory ../mrtn in the -..._from_files switches can be thought of as shorthand for the notation ../mrtn/*.f . Do not use wildcard characters in a -inline_from_... list.
The -..._libraries versions of these switches take as their arguments lists of subprogram libraries and directories containing such libraries.
KAP recognizes the type of file from its extension, or lack of one, as follows:
If multiple -inline_from... [-ipa_from...] switches are given, their lists are concatenated to get a bigger universe.
Routine name references are resolved by a search in the order that
files appear in
-inline_from...
or
-ipa_from...
switches on the command line. Libraries are searched in their original
lexical order. Multiple
-inline_from... -ipa_from...
lists are searched in the order that they appear on the command line.
7.1.2 Library Creation
Use the following switches to create a preprocessed library:
-inline_create=<library name> [-incr] -ipa_create=<library name> [-ipacr] |
To specify an existing library file to inline from, use the
-inline_from_libraries=
or the
-ipa_from_libraries=
switch.
The default source for routines to put into the library is the current source file. If -inline_from... or -ipa_from... is specified, the routines in the listed files are the ones put into the library. This provides a method to combine or expand libraries --- just include the old library(ies) in an -inline_from_libraries or -ipa_from_libraries switch, along with an -inline_from_files or -ipa_from_files switch giving source files containing any new subroutines and functions.
Routines are included in libraries in the order in which they appear in the input file(s). This is to make sure that if multiple routines with the same name are in the same source file, the one chosen for inlining will be the one that you expect from the algorithm under the -inline_from... switch.
A library created with -inline_create will work for inlining or IPA, because it is just partially reduced source code, but a library made with -ipa_create may not appear in an -inline_from= list. It is flagged with a warning message.
If no library name is given, the name used is file.klib , where file is the input file name with any trailing .f, .for, or .ftn stripped off.
When creating a library, only one -inline_create (-ipa_create) switch may be given. That is, only one library may be created per KAP run. If the library file existed prior to running KAP, it is overwritten.
When -inline_create (-ipa_create) is specified on the command line, no transformed code file will be generated.
See the description of the -inline_from_libraries and -ipa_from_libraries switches for information about using libraries created with these switches.
If no -inline (-ipa) switch is given, the default will be to include all the routines from the inlining universe in the library, if possible. If -inline=<name list> or -ipa=<name list> is specified, only the named routines will be included in the library. See Section 7.5 for a list of conditions that can prevent a routine from being inlined.
An example of inlining from the library created previously is included
in Section 7.2.
7.1.3 Naming Specific Routines
The following specifies names of particular routines to inline:
-inline[=name[,name...]] [-inl=] -ipa[=name[,name...]] [-ipa=] |
The default is all routines in the universe specified by any -inline_from... (-ipa_from...) switches, subject to the -inline_looplevel (-ipa_looplevel) setting.
Inlining and IPA are off by default, that is, if you do not specify inlining (IPA) switches and no inlining (IPA) directives are found in the source code, no inlining (IPA) will take place.
If you omit -inline (-ipa) from the command line, automatic selection of routines to inline is disabled. You can perform manual selection of routines to inline (analyze) with the -inline_manual (-ipa_manual) switches and the inline and IPA directives.
If you specify -inline (-ipa) on the command line without a list of routine names, then all routines in the inlining (IPA) universe are eligible, subject to the -inline_looplevel (-ipa_looplevel) and -inline_depth values.
If you specify -inline (-ipa) on the command line with a list of routine names, then only the routines that are included in the list are eligible, subject to the -inline_looplevel (-ipa_looplevel) and -inline_depth values. The list items can be separated by commas or colons.
The following switches have no versions, but they must have arguments as shown:
-noinline=name[,name..] [-ninl=] -noipa=name[,name..] [-nipa=] |
These switches enable the automatic inlining (IPA) algorithms in the same way that inline (IPA) does when given without arguments, but the routines listed are ones to NOT be inlined (analyzed). That is, all the subroutines and functions but the named ones are eligible.
A list of function names is required.
You cannot specify both -inline and -noinline (-ipa and -noipa) on the same command line.
If all call sites of a subroutine or function are to be inlined, the following variant of the -inline switch may be of interest:
-inline_and_copy[=name[,name..]] [-inlc=] |
The
-inline_and_copy
command-line switch functions like the
-inline
switch, except that if all references in the source file to a function
are inlined, the text of the function is copied to the transformed code
file unchanged. This is intended for use when the functions being
inlined are in the same file as the function reference, and has no
special effect when the routines being inlined are being taken from a
library or another source file.
7.1.4 DO Loop Level
The switches, -inline_looplevel=<n> [-inll] and -ipa_looplevel=<n> [-ipall] , set a minimum DO loop nest level for CALL/function reference expansion. The -inline_looplevel and -ipa_looplevel switches enable you to limit inlining and IPA to just routines that are referenced in nested loops, where the reduced call overhead or enhanced optimization will be multiplied.
The argument is defined from the most deeply nested leaf of the call tree. The default, 10, allows inlining (IPA) for the 10 deepest nest levels, for example:
PROGRAM MAIN .. CALL A ---> SUBROUTINE A .. DO DO CALL B --> SUBROUTINE B DO DO CALL C ---> SUBROUTINE C ENDDO ENDDO ENDDO ENDDO |
The CALL B is inside a doubly nested loop, and would be more profitable to expand than the CALL A . The CALL C is quadruply nested, so inlining C would yield the biggest gain of the three.
The argument is defined from the most deeply nested CALL or function reference:
The -inline_depth switch ( -inline_depth=<n> [-ind] ) sets the maximum level of subprogram nesting (CALLs in routines that are CALLed) that KAP will attempt to inline. Higher values cause KAP to trace CALLs and function references further. The values and their meanings are as follows:
1--10 --- Inline routines to this depth.
0 --- Use the default value.
-1 --- Inline only routines that do not contain CALLs or function references.
Recursive inlining can be quite expensive in compilation time. You must
exercise discretion in its use.
7.1.6 Manual Control
These switches ( -inline_manual [-inm] and -ipa_manual [-ipam] ) cause KAP to recognize the KAP !*$* [no]inline and !*$* [no]ipa directives. This allows manual control over which routines are inlined/analyzed at which call sites.
The default is to ignore these directives. They are enabled when any
inlining/IPA command-line switch is given on the command line. When
-inline_manual
or
-ipa_manual
is included on the command line, the directives are enabled without
enabling the automatic inlining (IPA) algorithms. Because
!*$* [no]inline
and
!*$* [no]ipa
are not restricted by the
-inline
,
-ipa
,
-..._looplevel
, and
-inline_depth
command-line switches, they can be used either with or without
command-line controlled inlining.
7.2 Inlining and IPA Directives
This section describes the following KAP directives:
!*$* [no]inline [here|routine|global] [(name[,name..])] !*$* [no]ipa [here|routine|global] [(name[,name..])] |
The !*$* inline and !*$*ipa directives tell KAP to inline or IPA the named routines. The !*$* noinline and !*$* noipa directives tell KAP not to inline or analyze the named routines. These directives combine next-line, entire routine, and global (entire program) scope. If none of the optional elements are included, all routines referenced on the next line of code that are in the inlining/analyzing universe are inlined on that line.
These directives are disabled by default. They are enabled when any inlining or IPA switch, respectively, is given on the command line. They can be enabled without activating the automatic inlining/IPA selection algorithms with the -inline_manual and -ipa_manual command-line switches. They are not restricted by the other inlining and IPA command-line switches, and can be used instead of, or in addition to, command-line controlled inlining.
The optional names are routine names. If any routines are named in the directive, it applies only to them. If NO routine names are given, the directive applies to ALL routines. The parentheses around the routine names are not required if the list of routine names is empty.
If a
!*$* inline
or
!*$* ipa
names a routine not in the corresponding universe, a warning message is
issued, and the directive is ignored.
7.3 Listing File Support
The following section describes the metric used for the optional
calling tree.
7.3.1 -listoptions=c
The optional calling tree includes the loop nest depth level of each
CALL/function reference. The metric used is the convention of the
-inline_looplevel
and
-ipa_looplevel
switches --- the farthest-out leaf is 1, and higher values trace back
to the main program.
7.4 Inlining/IPA Examples
The following code examples demonstrate a few of the possibilities for using the features described in this chapter. Because KAP undergoes constant enhancement, the code that your version of KAP produces may not be identical to that of these examples. The temporary variable names, in particular, can change without significantly altering the transformed code.
Unless otherwise noted, the following examples were run with
-optimize=0
-scalaropt=0
to show the inlining more clearly. If nonzero values are specified, the
routines are first inlined or analyzed, and then the regular serial
transformations are applied.
7.4.1 Inlining Example --- Same Source File
The following example demonstrates inlining both with -inline=setup , meaning only the subroutine setup will be inlined, and with -inline , meaning both subroutines are inlined. The KAP output includes optimized versions of both routines, in addition to the expanded main program. Setting -inline_looplevel>2 is required, because one CALL is in a loop and one is not.
Source file: PROGRAM TSTEXP REAL A(200,200),B(200,200),C(200,200) CALL SETUP(B,200) CALL SETUP(C,200) DO 100 N = 25,200,25 CALL MXMR(N,A,B,C) WRITE(*,900) N,A(7,13) 100 CONTINUE 900 FORMAT(1X,'For N=',I5,', A(7,13):',F12.4) END SUBROUTINE SETUP(E,N) REAL E(N,N) DO 10 I=1,N DO 10 J=1,N E(I,J) = MOD( I + 7*J, 10) /10.0 10 CONTINUE RETURN END SUBROUTINE MXMR(N,A,B,C) REAL A(200,200),B(200,200),C(200,200) DO 1000 I=1,N DO 1000 J=1,N A(I,J) = 0.0 DO 1000 K=1,N A(I,J)=A(I,J)+B(I,K)*C(K,J) 1000 CONTINUE RETURN END |
The -inline=setup switch generates the following main program:
PROGRAM TSTEXP REAL A(200,200),B(200,200),C(200,200) INTEGER II1, II2, II3, II4 DO 2 II1=1,200 DO 2 II2=1,200 B(II1,II2) = MOD (II1 + 7 * II2, 10) / 10.0 2 CONTINUE DO 3 II3=1,200 DO 3 II4=1,200 C(II3,II4) = MOD (II3 + 7 * II4, 10) / 10.0 3 CONTINUE DO 100 N=25,200,25 CALL MXMR(N,A,B,C) WRITE(*,900) N,A(7,13) 100 CONTINUE 900 FORMAT(1X,'For N=',I5,', A(7,13):',F12.4) END |
The -inline switch generates the following output:
PROGRAM TSTEXP REAL A(200,200),B(200,200),C(200,200) INTEGER II1, II2, II3, II4, II5, II6, II7 DO 3 II4=1,200 DO 3 II5=1,200 B(II4,II5) = MOD (II4 + 7 * II5, 10) / 10.0 3 CONTINUE DO 4 II6=1,200 DO 4 II7=1,200 C(II6,II7) = MOD (II6 + 7 * II7, 10) / 10.0 4 CONTINUE DO 100 N=25,200,25 DO 2 II1=1,N DO 2 II2=1,N A(II1,II2) = 0.0 DO 2 II3=1,N A(II1,II2) = A(II1,II2) + B(II1,II3) * C(II3,II2) 2 CONTINUE WRITE (*, 900) N, A(7,13) 100 CONTINUE 900 FORMAT(1X,'For N=',I5,', A(7,13):',F12.4) END |
The following example demonstrates the creation of a library and inlining routines from it; a two-step process:
Step 1: Create the library.
SUBROUTINE MKCOEF (COEF,N) REAL COEF(N) DO 99 I = 1,N COEF(I) = 1.0/I 99 CONTINUE RETURN END REAL FUNCTION YVAL (X, COEF, N) REAL COEF(N), X, SUM SUM = 0.0 DO 99 I=1,N SUM = SUM + COEF(I) * SIN(I*X) 99 CONTINUE YVAL = SUM RETURN END |
If the file subfil.f contains the previous two routines, then executing the KAP command, KAP -inline_create subfil.f will create a library file subfil.klib with the two routines, and a listing file subfil.out that contains only a list of routines and whether or not each was saved in the library:
subroutine MKCOEF -- saved function YVAL -- saved |
Step 2: Inline the routines into a calling program.
PROGRAM LIBCR PARAMETER (NC = 15) PARAMETER (PI = 3.141593) REAL COEF(NC), YVAL, Y(2000) CALL MKCOEF (COEF, NC) DO 900 I = 1,2000 Y(I) = YVAL( I*0.001*PI, COEF, NC) 900 CONTINUE J=1 DO 910 I=1,2000,10 PRINT *, (Y(J),J=I,I+9) 910 CONTINUE END |
If the file sqwv.f contains the main program LIBCR , then running the command KAP -inline-infl=subfil.klib -inll=2 -o=0 -r=0 -so=0 sqwv.f will put the following into the file sqwv.cmp:
PROGRAM LIBCR PARAMETER (NC = 15) PARAMETER (PI = 3.141593) REAL COEF(NC), YVAL, Y(2000) SAVE J REAL RR1, RR2, RR3 INTEGER II1, II2 DO 3 II2=1,NC COEF(II2) = 1.0 / II2 3 CONTINUE DO 900 I=1,2000 RR1 = I * 0.001 * PI RR2 = 0.0 DO 2 II1=1,NC RR2 = RR2 + COEF(II1) * SIN (II1 * RR1) 2 CONTINUE RR3 = RR2 Y(I) = RR3 900 CONTINUE J=1 DO 910 I=1,2000,10 PRINT *, (Y(J),J=I,I+9) 910 CONTINUE END |
In the previous example, all other optimizations were turned off to
show the expansion more clearly. If you specify nonzero values for the
-optimize
,
-scalaropt
, and
-roundoff
switches, KAP first inlines the routines, then performs the
optimizations in the usual manner.
7.4.3 IPA Example
In the following example, the variable N always has the same value, so the same IF branch will always be taken. This information is hidden behind a subroutine call, however, so KAP normally will not try to perform dead-code elimination to simplify the block IF in the first routine. When the -ipa=setn command-line switch is specified, KAP will inspect the named subroutine for information on the relationship of its arguments and returned value and the surrounding code. Once the CALLed routine is examined, KAP global forward substitution and dead-code elimination transformations (see Chapter 8) can delete the unused code.
If a subroutine or function cannot be inlined, or if you do not want to inline it, it can often still be analyzed for its effects on the calling routine.
The following example was run with the default values for -optimize and -scalaropt :
CALL SETN ( N ) IF ( N.GT.10 ) THEN X = 1. ELSE X = 2. ENDIF ... SUBROUTINE SETN (N) INTEGER N N = 12 RETURN END |
The example becomes the following:
CALL SETN (N) X = 1. ... |
Just the CALL and the simplified IF block were shown.
7.4.4 Recursive Inlining Examples
The -inline_depth switch sets the maximum level of subprogram nesting that KAP will attempt to inline. Higher values cause KAP to trace CALLs and function references further.
Consider the following simplified example:
PROGRAM EXDDEM REAL A,B,C,D,E,F,G CALL S1 (A,B) CALL S2 (C,D,E,F) CALL S3 (G) PRINT *,A,B,C,D,E,F,G END SUBROUTINE S1 (W,X) REAL W,X W=1.0 CALL S4(X) RETURN END SUBROUTINE S2 (Q,R,S,T) REAL Q,R,S,T Q = 2.0 CALL S1 (R,S) CALL S3 (T) RETURN END SUBROUTINE S3 (U) REAL U U = 137.0 RETURN END SUBROUTINE S4 (V) REAL V V = 2.7 RETURN END |
When run with -inline and -inline_depth=4 , all the subroutines are inlined, including calls to calls to calls, and the main program becomes:
PROGRAM EXDDEM REAL A,B,C,D,E,F,G EXTERNAL S4 A = 1.0 B = 2.7 C = 2.0 D = 1.0 E = 2.7 F = 137.0 G = 137.0 PRINT *, A, B, C, D, E, F, G END |
When run with -inline and -inline_depth=1 , meaning inline only one routine deep, all the CALLs in the main program and subroutines are expanded, but CALLs in inlined routines are not. The main program becomes:
PROGRAM EXDDEM REAL A,B,C,D,E,F,G EXTERNAL S4 A = 1.0 CALL S4 (B) C = 2.0 CALL S1 (D,E) CALL S3 (F) G = 137.0 PRINT *, A, B, C, D, E, F, G END |
When run with -inline and -inline_depth=-1 (inline only routines that do not contain CALLs or FUNCTION references), the main program becomes:
PROGRAM EXDDEM REAL A,B,C,D,E,F,G CALL S1 (A,B) CALL S2 (C,D,E,F) G = 137.0 PRINT *, A, B, C, D, E, F, G END |
In this last case, only SUBROUTINEs
S3
and
S4
could be inlined. Repeated runs with
-inline_depth=1
can be used to inline additional levels of routines.
7.4.5 Manual Inlining Example
Manual inlining and IPA allow you greater control over the routines that are inlined/analyzed at CALL sites. They use directives ( !*$*inline and !*$*ipa ) that are placed into the source code. The directives are normally ignored by KAP, but are enabled when an inlining or IPA switch, respectively, is given on the command line.
The following example is based on the Recursive Inlining example from Section 7.4.4. It was run with default switches, except for -inline_manual .
The directives used have different scopes. Routine S1 is inlined everywhere it is used, routine S3 is inlined only in the main program, and routine S4 is inlined only in routine S1 , which is then inlined elsewhere.
With the default value for -scalaropt , forward substitution places the assigned values into the PRINT statement and dead-code elimination deletes the now-unneeded inlined assignments:
!*$* INLINE GLOBAL (S1) !*$* INLINE ROUTINE (S3) PROGRAM EXDTST REAL A,B,C,D,E,F,G CALL S1 (A,B) CALL S2 (C,D,E,F) CALL S3 (G) PRINT *,A,B,C,D,E,F,G END SUBROUTINE S1 (W,X) REAL W,X W=1.0 !*$* INLINE CALL S4(X) RETURN END SUBROUTINE S2 (Q,R,S,T) REAL Q,R,S,T Q = 2.0 CALL S1 (R,S) CALL S3 (T) RETURN END SUBROUTINE S3 (U) REAL U U = 137.0 RETURN END SUBROUTINE S4 (V) REAL V V = 2.7 RETURN END |
Becomes:
KAP/Tru64_U_F90 4.4 k340504 20010517 01-Sep-2001 09:31:22 !*$* INLINE GLOBAL ( S1 ) !*$* INLINE ROUTINE ( S3 ) PROGRAM EXDTST REAL A, B, C, D, E, F, G SAVE G, F, E, D, C, B, A EXTERNAL S4 CALL S2 (C,D,E,F) PRINT *, 1., 2.7, C, D, E, F, 137. END KAP/Tru64_U_F90 4.4 k340504 20010517 01-Sep-2001 09:31:22 SUBROUTINE S1 (W, X) REAL W, X W = 1. !*$* INLINE X = 2.7 RETURN END KAP/Tru64_U_F90 4.4 k340504 20010517 01-Sep-2001 09:31:22 SUBROUTINE S2 (Q, R, S, T) REAL Q, R, S, T EXTERNAL S4 Q = 2. R = 1. S = 2.7 CALL S3 (T) RETURN END |
Subroutines
S3
and
S4
are unchanged.
7.4.6 Notes on Inlining and IPA
Routines to be inlined must pass all the criteria ( -inline=, -inline_looplevel, -inline_depth ) to be inlined.
The !*$* [no]inline and !*$* [no]ipa directives, when enabled, override the inlining/IPA command-line switches.
A !*$* inline global directive without a name list tells KAP to inline every routine it can, regardless of the -inline , -inline_depth , and -inline_looplevel settings. A !*$* noinline global directive tells KAP not to inline anything, regardless of the -inline, -inline_depth, and -inline_looplevel settings. The !*$* inline and !*$* ipa directives are disabled by default; they are enabled when any inlining or IPA command-line switch is specified.
When a library is specified with -inline_from_libraries , routines may be taken from that library for inlining into the source code. No attempt is made to inline routines from the source file into routines from the library. For example, if the main program calls routine BB, which is in the library, and BB calls routine DD, which is in the source file, then BB can be inlined into the main program, but KAP will not attempt to inline DD into the text from library routine BB.
A library created with -inline_create will work for inlining or IPA, because it is just partially reduced source code. However, a library created with -ipa_create may not appear in an -inline_from_libraries= list. Attempting to do so is flagged with a warning message.
Inlining and IPA are slow, memory-intensive activities. Specifying
large values for
-inline_looplevel
and
-inline_depth
(inline all available routines everywhere they are used) for a large
set of inlinable routines for a large source file can absorb
significant system resources. For most programs, specifying small
values for
-inline_looplevel
and
-inline_depth
and/or a small number of routines with
-inline=
can provide most of the benefits of inlining. The same applies for the
IPA switches.
7.5 Conditions Inhibiting Inlining/IPA
This section lists conditions that inhibit the inlining of subroutines and functions, whether from a library or source file. Many constructs that prevent inlining will also stop or restrict IPA.
See Section 7.4.6 for other notes on the use of the -inline and -ipa command-line switches and directives.
Conditions that inhibit inlining:
Previous | Next | Contents | Index |