Preface |
Preface
|
Preface
|
Chapter 1 |
1
|
Getting Started
|
1.1
|
Compaq Fortran Programming Environment
|
1.2
|
Commands to Create and Run an Executable Program
|
1.3
|
Creating and Running a Program Using a Module and Separate Function
|
1.3.1
|
Commands to Create the Executable Program
|
1.3.2
|
Running the Sample Program
|
1.3.3
|
Debugging the Sample Program
|
1.4
|
f90 or fort Command and Related Software Components
|
1.4.1
|
f90 or fort Driver Program
|
1.4.2
|
cpp, fpp, and Other Preprocessors
|
1.4.3
|
Compaq Fortran Compiler
|
1.4.4
|
Other Compilers
|
1.4.5
|
Linker (ld)
|
1.5
|
Program Development Stages and Tools
|
1.6
|
Compaq Fortran and Standards It Conforms To
|
Chapter 2 |
2
|
Compiling and Linking Compaq Fortran Programs
|
2.1
|
f90 Command: Files and Options
|
2.1.1
|
File Suffixes and Source Forms
|
2.1.2
|
Format of the f90 and fort Commands
|
2.1.3
|
Creating and Using Module Files
|
2.1.3.1
|
Creating Module Files
|
2.1.3.2
|
Using Module Files
|
2.1.4
|
INCLUDE Statement and Using Include Files
|
2.1.5
|
Output Files: Executable, Object, and Temporary
|
2.1.5.1
|
Naming Output Files
|
2.1.5.2
|
Temporary Files
|
2.1.6
|
Using Multiple Input Files: Effect on Output Files
|
2.1.7
|
Examples of the f90 and fort Commands
|
2.1.7.1
|
Compiling and Linking Multiple Files
|
2.1.7.2
|
Retaining an Object File and Preventing Linking
|
2.1.7.3
|
Compiling Fortran 95/90 and C Source Files and Linking an Object File
|
2.1.7.4
|
Renaming the Output File
|
2.1.7.5
|
Specifying an Additional Linker Library
|
2.1.7.6
|
Requesting Additional Optimizations
|
2.1.8
|
Using Listing Files
|
2.2
|
Driver Programs and Passing Options to cc and ld
|
2.2.1
|
make Facility
|
2.2.2
|
Options Passed to the cc Driver or ld Linker
|
2.3
|
Compiler Limits, Diagnostic Messages, and Error Conditions
|
2.3.1
|
Compiler Limits
|
2.3.2
|
Compiler Diagnostic Messages and Error Conditions
|
2.3.3
|
Linker Diagnostic Messages and Error Conditions
|
2.4
|
Compilation Control: Statements and Directives
|
2.5
|
Linking Object Libraries
|
2.5.1
|
Specifying Additional Object Libraries
|
2.5.2
|
Specifying Types of Object Libraries
|
2.5.3
|
Specifying Shared Object Libraries
|
2.6
|
Creating Shared Libraries
|
2.6.1
|
Creating a Shared Library with a Single f90 Command
|
2.6.2
|
Creating a Shared Library with f90 and ld Commands
|
2.6.3
|
Choosing How to Create a Shared Library
|
2.6.4
|
Shared Library Restrictions
|
2.6.5
|
Installing Shared Libraries (TU*X ONLY)
|
Chapter 3 |
3
|
f90 and fort Command-Line Options
|
3.1
|
Overview of Command-Line Options
|
3.2
|
f90 and fort Command Categories and Options
|
3.3
|
-align keyword --- Data Alignment
|
3.4
|
-annotations keyword --- Place Optimization Information in Source Listing
|
3.5
|
-arch keyword --- Specify Type of Code Instructions Generated
|
3.6
|
-assume buffered_io --- Buffered Output
|
3.7
|
-assume byterecl --- Units for Unformatted File Record Length
|
3.8
|
-assume cc_omp --- Enable Conditional Compilation for OpenMP
|
3.9
|
-assume dummy_aliases --- Dummy Variable Aliasing
|
3.10
|
-assume gfullpath --- Source File Path for Debugging
|
3.11
|
-assume minus0 --- Standard Semantics for Minus Zero
|
3.12
|
-assume noaccuracy_sensitive, -fp_reorder --- Reorder Floating-Point Calculations
|
3.13
|
-assume noprotect_constants --- Remove Protection from Constants
|
3.14
|
-assume nosource_include --- INCLUDE file search
|
3.15
|
-assume nounderscore --- Underscore on External Names
|
3.16
|
-assume no2underscores --- Two Underscores on External Names
|
3.17
|
-assume pthreads_lock --- Thread Lock Selection for Parallel Execution
|
3.18
|
-automatic, -static --- Local Variable Allocation
|
3.19
|
-c --- Inhibit Linking and Retain Object File
|
3.20
|
-call_shared, -non_shared, -shared --- Shared Library Use
|
3.21
|
-ccdefault keyword --- Carriage Control for Terminals
|
3.22
|
-check arg_temp_created --- Check for Copy of Temporary Arguments
|
3.23
|
-check bounds, -C, -check_bounds --- Boundary Run-Time Checking
|
3.24
|
-check format --- Format Mismatches at Run Time
|
3.25
|
-check nopower --- Allow Special Floating-Point Expressions
|
3.26
|
-check omp_bindings --- OpenMP Fortran API Binding Rules Checking
|
3.27
|
-check output_conversion --- Truncated Format Mismatches at Run Time
|
3.28
|
-check overflow --- Integer Overflow Run-Time Checking
|
3.29
|
-check underflow --- Floating-Point Underflow Run-Time Checking
|
3.30
|
-convert keyword --- Unformatted Numeric Data Conversion
|
3.31
|
-cpp and Related Options --- Run C Preprocessor
|
3.31.1
|
-M --- Request cpp Dependency Lists for make
|
3.31.2
|
-P --- Retain cpp Intermediate Files
|
3.31.3
|
-Wp,-xxx --- Pass Specified Option to cpp
|
3.32
|
-Dname, -Dname=def, -Dname="string" --- Define Symbol Names
|
3.33
|
-d_lines --- Debugging Statement Indicator, Column 1
|
3.34
|
-double_size 128, -double_size 64 --- Double Precision Data Size
|
3.35
|
-error_limit num, -noerror_limit --- Limit Error Messages
|
3.36
|
-extend_source --- Line Length for Fixed-Format Source
|
3.37
|
-f66, -66, -nof77, -onetrip, -1 --- Use FORTRAN 66 Semantics
|
3.38
|
-f77, -nof66 --- Use FORTRAN 77 Semantics
|
3.39
|
-f77rtl --- Use Fortran 77 Run-Time Behavior
|
3.40
|
-fast --- Set Options to Improve Run-Time Performance
|
3.41
|
-feedback file, -gen_feedback, -cord --- Create and Use Feedback Files
|
3.42
|
-fixed, -free --- Fortran Source Format
|
3.43
|
-fpconstant --- Handling of Floating-Point Constants
|
3.44
|
-fpen --- Control Arithmetic Exception Handling and Reporting
|
3.44.1
|
Hints on Using These Options
|
3.45
|
-fpp --- Run Fortran Preprocessor
|
3.46
|
-fprm keyword --- Control Floating-Point Rounding Mode
|
3.47
|
-fuse_xref --- Cross-Reference Information for Compaq FUSE
|
3.48
|
-g0, -g1, -g2 or -g, -g3, -ladebug --- Traceback and Symbol Table Information
|
3.49
|
-granularity keyword --- Control Shared Memory Access to Data
|
3.50
|
-hpf, -hpf num, and Related Options --- Compile HPF Programs for Parallel Execution
|
3.50.1
|
-assume bigarrays --- Run-time Checking for Distributed Small Array Dimensions
|
3.50.2
|
-assume nozsize --- Omit Zero-Sized Array Checking
|
3.50.3
|
-nearest_neighbor, -nearest_neighbor num, or -nonearest_neighbor --- Nearest Neighbor Optimization
|
3.50.4
|
-show hpf --- Show HPF Parallelization Information
|
3.50.5
|
-hpf_target --- Message Passing Protocol for Parallel Programs
|
3.51
|
-I --- Remove Directory from Include Search Path
|
3.52
|
-Idir --- Add Directory for Module and Include File Search
|
3.53
|
-i2, -i4, -i8, -integer_size num --- Integer and Logical Data Size
|
3.54
|
-inline keyword, -noinline --- Control Procedure Inlining
|
3.55
|
-intconstant --- Handling of Integer Constants
|
3.56
|
-K --- Keep Temporary Files
|
3.57
|
-L --- Remove ld Directory Search Path
|
3.58
|
-Ldir --- Add Directory to ld Search Path
|
3.59
|
-lstring --- Add Library Name to ld Search
|
3.60
|
-machine_code
|
3.61
|
-math_library keyword --- Fast or Accurate Math Library Routines
|
3.62
|
-mixed_str_len_arg --- Specify Length of Character Arguments
|
3.63
|
-module directory --- Specify Directory for Creating Modules Files
|
3.64
|
-mp --- Enable Parallel Processing Using Directed Decomposition
|
3.65
|
-names keyword --- Case Control of Source and External Names
|
3.66
|
-noaltparam --- Alternative PARAMETER Syntax
|
3.67
|
-nofor_main --- Allow Non-Fortran Main Program
|
3.68
|
-nohpf_main, -nowsf_main --- Compile HPF Global Routine for Nonparallel Main Program
|
3.69
|
-noinclude --- Omit Standard Directory Search for INCLUDE Files
|
3.70
|
-norun --- Do Not Run the Compiler
|
3.71
|
-o output --- Name Output File
|
3.72
|
-O0, -O1, -O2, -O3, -O4 or -O, -O5 --- Specify Optimization Level
|
3.73
|
-om --- Request Nonshared Object Optimizations
|
3.74
|
-omp --- Enable OpenMP Parallel Processing Using Directed Decomposition
|
3.75
|
-pad_source --- Pad Short Source Records with Spaces
|
3.76
|
-pipeline --- Activate Software Pipelining Optimization
|
3.77
|
-p0, -p1 or -p, and -pg --- Profiling Support
|
3.78
|
-real_size number, -r8, -r16 --- Floating-Point Data Size
|
3.79
|
-recursive --- Request Recursive Execution
|
3.80
|
-reentrancy keyword --- Control Use of Threaded Run-Time Library
|
3.81
|
-S --- Create Assembler File
|
3.82
|
-show keyword, -machine_code --- Control Listing File Content
|
3.83
|
-source_listing --- Create a Source Listing File
|
3.84
|
-speculate keyword --- Speculative Execution Optimization
|
3.85
|
-std, -std90, -std95 --- Perform Fortran Standards Checking
|
3.86
|
-synchronous_exceptions --- Report Exceptions More Precisely
|
3.87
|
-syntax_only --- Do Not Create Object File
|
3.88
|
-threads, -pthread --- Link Using Threaded Run-Time Library
|
3.89
|
-transform_loops --- Activate Loop Transformation Optimizations
|
3.90
|
-tune keyword --- Specify Alpha Processor Implementation
|
3.91
|
-U --- Activates Case Sensitivity
|
3.92
|
-Uname --- Undefine Preprocessor Symbol Name
|
3.93
|
-u
|
3.94
|
-unroll num --- Specify Number for Loop Unroll Optimization
|
3.95
|
-V --- Create Listing File
|
3.96
|
-v --- Verbose Command Processing Display
|
3.97
|
-version, -what --- Show Compaq Fortran Version Information
|
3.98
|
-vms --- OpenVMS Fortran Compatibility
|
3.99
|
-Wl,-xxx --- Pass Specified Option to ld
|
3.100
|
-warn keyword, -u, -nowarn, -w, -w1 --- Warning Messages and Compiler Checking
|
3.101
|
-warning_severity keyword --- Elevate Severity of Warning Messages
|
3.102
|
-what
|
3.103
|
-wsf
|
Chapter 4 |
4
|
Using the Ladebug Debugger
|
4.1
|
Overview of Ladebug and dbx Debuggers
|
4.2
|
Compaq Fortran Options for Debugging
|
4.3
|
Running the Debugger
|
4.3.1
|
Creating the Executable Program and Running the Debugger
|
4.3.1.1
|
Invoking Ladebug
|
4.3.1.2
|
Invoking dbx
|
4.3.2
|
Debugger Commands and Breakpoints
|
4.3.3
|
Ladebug Limitations
|
4.4
|
Sample Program and Debugging Session
|
4.5
|
Summary of Debugger Commands
|
4.6
|
Displaying Variables
|
4.6.1
|
Compaq Fortran Module Variables
|
4.6.2
|
Compaq Fortran Common Block Variables
|
4.6.3
|
Compaq Fortran Derived-Type Variables
|
4.6.4
|
Compaq Fortran Record Variables
|
4.6.5
|
Compaq Fortran Pointer Variables
|
4.6.5.1
|
Fortran 95/90 Pointers
|
4.6.5.2
|
CRAY-Style Pointers
|
4.6.6
|
Compaq Fortran Array Variables
|
4.6.6.1
|
Array Sections
|
4.6.6.2
|
Assignment to Arrays
|
4.6.7
|
Complex Variables
|
4.6.8
|
Compaq Fortran Data Types
|
4.7
|
Expressions in Debugger Commands
|
4.7.1
|
Fortran Operators
|
4.7.2
|
Procedures
|
4.8
|
Debugging Mixed-Language Programs with Ladebug
|
4.9
|
Debugging a Program that Generates an Exception
|
4.10
|
Locating Unaligned Data
|
4.10.1
|
Locating Unaligned Data With Ladebug
|
4.10.2
|
Locating Unaligned Data With dbx
|
4.11
|
Using Alternate Entry Points
|
4.12
|
Debugging Optimized Programs
|
Chapter 5 |
5
|
Performance: Making Programs Run Faster
|
5.1
|
Efficient Compilation and the Software Environment
|
5.1.1
|
Install the Latest Version of Compaq Fortran and Performance Products
|
5.1.2
|
Compile Using Multiple Source Files and Appropriate f90 Options
|
5.1.3
|
Process Shell Environment and Related Influences on Performance
|
5.2
|
Using the time Command to Measure Performance
|
5.3
|
Using Profiling Tools
|
5.3.1
|
Program Counter Sampling (prof)
|
5.3.2
|
Call Graph Sampling (gprof)
|
5.3.3
|
Basic Block Counting (pixie and prof)
|
5.3.4
|
Source Line CPU Cycle Use (prof and pixie)
|
5.3.5
|
Creating and Using Feedback Files and Optionally cord
|
5.3.6
|
Atom Toolkit
|
5.4
|
Data Alignment Considerations
|
5.4.1
|
Causes of Unaligned Data and Ensuring Natural Alignment
|
5.4.2
|
Checking for Inefficient Unaligned Data
|
5.4.3
|
Ordering Data Declarations to Avoid Unaligned Data
|
5.4.3.1
|
Arranging Data Items in Common Blocks
|
5.4.3.2
|
Arranging Data Items in Derived-Type Data
|
5.4.3.3
|
Arranging Data Items in Compaq Fortran Record Structures
|
5.4.4
|
Options Controlling Alignment
|
5.5
|
Using Arrays Efficiently
|
5.5.1
|
Accessing Arrays Efficiently
|
5.5.2
|
Passing Array Arguments Efficiently
|
5.6
|
Improving Overall I/O Performance
|
5.6.1
|
Use Unformatted Files Instead of Formatted Files
|
5.6.2
|
Write Whole Arrays or Strings
|
5.6.3
|
Write Array Data in the Natural Storage Order
|
5.6.4
|
Use Memory for Intermediate Results
|
5.6.5
|
Enable Implied-DO Loop Collapsing
|
5.6.6
|
Use of Variable Format Expressions
|
5.6.7
|
Efficient Use of Record Buffers and Disk I/O
|
5.6.8
|
Specify RECL
|
5.6.9
|
Use the Optimal Record Type
|
5.6.10
|
Reading from a Redirected Standard Input File
|
5.7
|
Additional Source Code Guidelines for Run-Time Efficiency
|
5.7.1
|
Avoid Small Integer and Small Logical Data Items
|
5.7.2
|
Avoid Mixed Data Type Arithmetic Expressions
|
5.7.3
|
Use Efficient Data Types
|
5.7.4
|
Avoid Using Slow Arithmetic Operators
|
5.7.5
|
Avoid Using EQUIVALENCE Statements
|
5.7.6
|
Use Statement Functions and Internal Subprograms
|
5.7.7
|
Code DO Loops for Efficiency
|
5.8
|
Optimization Levels: the -On Option
|
5.8.1
|
Optimizations Performed at All Optimization Levels
|
5.8.2
|
Local (Minimal) Optimizations
|
5.8.2.1
|
Common Subexpression Elimination
|
5.8.2.2
|
Integer Multiplication and Division Expansion
|
5.8.2.3
|
Compile-Time Operations
|
5.8.2.4
|
Value Propagation
|
5.8.2.5
|
Dead Store Elimination
|
5.8.2.6
|
Register Usage
|
5.8.2.7
|
Mixed Real/Complex Operations
|
5.8.3
|
Global Optimizations
|
5.8.4
|
Additional Global Optimizations
|
5.8.4.1
|
Loop Unrolling
|
5.8.4.2
|
Code Replication to Eliminate Branches
|
5.8.5
|
Automatic Inlining
|
5.8.5.1
|
Interprocedure Analysis
|
5.8.5.2
|
Inlining Procedures
|
5.8.6
|
Software Pipelining
|
5.8.7
|
Loop Transformation
|
5.9
|
Other Options Related to Optimization
|
5.9.1
|
Setting Multiple Options with the -fast Option
|
5.9.2
|
Controlling the Number of Times a Loop Is Unrolled
|
5.9.3
|
Controlling the Inlining of Procedures
|
5.9.4
|
Requesting Optimized Code for a Specific Processor Generation
|
5.9.5
|
Requesting the Speculative Execution Optimization
|
5.9.6
|
Request Nonshared Object Optimizations
|
5.9.7
|
Arithmetic Reordering Optimizations
|
5.9.8
|
Dummy Aliasing Assumption
|
Chapter 6 |
6
|
Parallel Compiler Directives and Their Programming Environment
|
6.1
|
OpenMP Fortran API Compiler Directives
|
6.1.1
|
Command-Line Option and Directives Format
|
6.1.1.1
|
Directive Prefixes
|
6.1.1.2
|
Directive Prefixes for Conditional Compilation
|
6.1.2
|
Summary Descriptions of OpenMP Fortran API Compiler Directives
|
6.1.3
|
Parallel Processing Thread Model
|
6.1.4
|
Privatizing Named Common Blocks: THREADPRIVATE Directive
|
6.1.5
|
Controlling Data Scope Attributes
|
6.1.6
|
Parallel Region: PARALLEL and END PARALLEL Directives
|
6.1.7
|
Worksharing Constructs
|
6.1.7.1
|
DO and END DO directives
|
6.1.7.2
|
SECTIONS, SECTION, and END SECTIONS Directives
|
6.1.7.3
|
SINGLE and END SINGLE Directives
|
6.1.8
|
Combined Parallel/Worksharing Constructs
|
6.1.8.1
|
PARALLEL DO and END PARALLEL DO Directives
|
6.1.8.2
|
PARALLEL SECTIONS and END PARALLEL SECTIONS Directives
|
6.1.9
|
Synchronization Constructs
|
6.1.9.1
|
ATOMIC Directive
|
6.1.9.2
|
BARRIER Directive
|
6.1.9.3
|
CRITICAL and END CRITICAL Directives
|
6.1.9.4
|
FLUSH Directive
|
6.1.9.5
|
MASTER and END MASTER Directives
|
6.1.9.6
|
ORDERED and END ORDERED Directives
|
6.1.10
|
Specifying Schedule Type and Chunk Size
|
6.2
|
Compaq Fortran Parallel Compiler Directives
|
6.2.1
|
Command-Line Option and Directives Format
|
6.2.1.1
|
Directive Prefixes
|
6.2.2
|
Summary Descriptions of Compaq Fortran Parallel Compiler Directives
|
6.2.3
|
Parallel Processing Thread Model
|
6.2.4
|
Privatizing Named Common Blocks: TASKCOMMON or INSTANCE Directives
|
6.2.5
|
Controlling Data Scope Attributes
|
6.2.6
|
Parallel Region: PARALLEL and END PARALLEL Directives
|
6.2.7
|
Worksharing Constructs
|
6.2.7.1
|
PDO and END PDO Directives
|
6.2.7.2
|
PSECTIONS, SECTION, and END PSECTIONS Directives
|
6.2.7.3
|
SINGLE PROCESS and END SINGLE PROCESS Directives
|
6.2.8
|
Combined Parallel/Worksharing Constructs
|
6.2.8.1
|
PARALLEL DO and END PARALLEL DO Directives
|
6.2.8.2
|
PARALLEL SECTIONS and END PARALLEL SECTIONS Directives
|
6.2.9
|
Synchronization Constructs
|
6.2.9.1
|
BARRIER Directive
|
6.2.9.2
|
CRITICAL SECTION and END CRITICAL SECTION Directives
|
6.2.10
|
Specifying a Default Chunk Size
|
6.2.11
|
Specifying a Default Schedule Type
|
6.2.12
|
Terminating Loop Execution Early: PDONE Directive
|
6.3
|
Decomposing Loops for Parallel Processing
|
6.3.1
|
Steps in Using Directed Decomposition
|
6.3.2
|
Resolving Dependences Manually
|
6.3.2.1
|
Resolving Dependences Involving Temporary Variables
|
6.3.2.2
|
Resolving Loop-Carried Dependences
|
6.3.2.3
|
Loop Alignment
|
6.3.2.4
|
Code Replication
|
6.3.2.5
|
Loop Distribution
|
6.3.2.6
|
Restructuring a Loop into an Inner and Outer Nest
|
6.3.2.7
|
Dependences Requiring Locks
|
6.3.3
|
Coding Restrictions
|
6.3.4
|
Manual Optimization
|
6.3.4.1
|
Interchanging Loops
|
6.3.4.2
|
Balancing the Workload
|
6.4
|
Environment Variables for Adjusting the Run-Time Environment
|
6.5
|
Calls to Programs Written in Other Languages
|
6.6
|
Compiling, Linking, and Running Parallelized Programs on SMP Systems
|
6.7
|
Debugging Parallelized Programs
|
6.7.1
|
Debugger Limitations for Parallelized Programs
|
6.7.2
|
Debugging Parallel Regions
|
6.7.3
|
Debugging Shared Variables
|