Software Product Description ___________________________________________________________________ PRODUCT NAME: HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 DESCRIPTION HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems contains both the HP Fortran 95/90 Version 5.5A software and the HP Fortran 77 Version 5.5A software as well as the Compaq Extended Math Library (CXML) Ver- sion 5.20. In this Software Product Description (SPD), HP Fortran refers to HP Fortran 95/90 unless a specific reference to the 95/90 or 77 prod- uct is needed to distinguish between the two software products. HP Fortran is an implementation of the Fortran programming language that supports the FORTRAN 66, FORTRAN 77, Fortran 90, and Fortran 95 standards. HP Fortran 95/90 and HP Fortran 77 fully support the fol- lowing standards: o ANSI X3.9-1966 (FORTRAN 66) o ANSI X3.9-1978 (FORTRAN 77) o ISO 1539-1980(E) (FORTRAN 77) o MIL-STD-1753 o FIPS-69-1 (HP Fortran meets the requirements of this standard by conforming to the ANSI Standard and by including a flagger. The flag- ger optionally produces diagnostic messages for compile-time el- ements that do not conform to the Full-Level ANSI Fortran Standard.) HP Fortran 95/90 supports all of the standards that HP Fortran 77 sup- ports plus the following new standards: o ANSI X3.198-1992 (Fortran 90) o ISO/IEC 1539-1:1997(E) (Fortran 95) September 2003 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 HP FORTRAN HP Fortran 95/90 fully supports the multivendor OpenMP Fortran Ver- sion 1.1 Specification, including support for directed parallel pro- cessing using OpenMP directives on shared memory multiprocessor sys- tems. HP Fortran 95/90 supports most statement, library, and directive ex- tensions from the HPF-2 specification (High Performance FORTRAN Lan- guage Specification, Version 2.0, January 31, 1997). HP Fortran supports extensions to the ISO and ANSI standards, includ- ing a number of extensions defined by HP Fortran for the various HP Fortran platforms (operating system/architecture pairs). In addition to HP Tru64 UNIX Alpha systems, HP Fortran platforms include: o Compaq Visual Fortran for Intel systems under Windows NT, Windows 2000, Windows Me, Windows 98, and Windows 95 o HP Fortran for Linux Alpha systems o HP Fortran and HP Fortran 77 for OpenVMS Alpha systems o Compaq Fortran 77 for OpenVMS VAX systems Major additions to the FORTRAN 77 standard introduced by the Fortran 90 standard include: o Array operations o Improved facilities for numeric computation o Parameterized intrinsic data types o User-defined data types o Facilities for modular data and procedure definitions o Pointers o The concept of language evolution o Support for DATE_AND_TIME intrinsic for obtaining dates using a four- digit year format 2 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 HP Fortran contains full support for the Fortran 95 standard, includ- ing the following features: o FORALL statement and construct o Automatic deallocation of ALLOCATABLE arrays o DIM argument to MAXLOC and MINLOC o PURE user-defined subprograms o ELEMENTAL user-defined subprograms (a restricted form of a pure pro- cedure) o Pointer initialization (initial value) o The NULL intrinsic to nullify a pointer o Derived-type structure initialization o CPU_TIME intrinsic subroutine o KIND argument to CEILING and FLOOR intrinsics o Nested WHERE constructs, masked ELSEWHERE statement, and named WHERE constructs o Comments allowed in namelist input o Generic identifier in END INTERFACE statements o Minimal width field editing using a numeric edit descriptor with 0 width o Detection of Obsolescent and/or Deleted features listed in the For- tran 95 standard. HP Fortran flags these obsolescent and deleted features, but fully supports them. HP Fortran includes the following features and enhancements: o Full support for 64-bit address space, including 64-bit static space o Support for linking against static and shared libraries o Support for creating shareable code to be put into a shared library o Support for stack-based storage 3 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Support for dynamic memory allocation o Support for reading and writing binary data files in nonnative for- mats, including IEEE[TM] (little-endian and big-endian), VAX, IBM[TM] System\360, and CRAY[TM] integer and floating point formats o User control over IEEE floating point exception handling, report- ing, and resulting values o Control for memory boundary alignment of items in COMMON and fields in structures and warnings for unaligned data o Directives to control listing page titles and subtitles, object file identification field, COMMON and record field alignment, and some attributes of COMMON blocks o Ability to CALL an external function subprogram o 7200 Character Statement Length o Free form unlimited line length o Mixing Subroutines/Functions in Generic Interfaces o Composite data declarations using STRUCTURE, END STRUCTURE, and RECORD statements, and access to record components through field refer- ences o Explicit specification of storage allocation units for data types such as: INTEGER*4 LOGICAL*4 REAL*4 REAL*8 COMPLEX*8 o Support for 64-bit signed integers using INTEGER*8 and LOGICAL*8 o Support for 128-bit floating-point real numbers (reals) using REAL*16 and COMPLEX*32 o A set of data types: - BYTE 4 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 - LOGICAL*1, LOGICAL*2, LOGICAL*4, LOGICAL*8 - INTEGER*1, INTEGER*2, INTEGER*4, INTEGER*8 - REAL*4, REAL*8, REAL*16 - COMPLEX*8, COMPLEX*16, DOUBLE COMPLEX, COMPLEX*32 - POINTER (CRAY style) o Data statement style initialization in type declaration statements o AUTOMATIC and STATIC statements o Bit constants to initialize LOGICAL, REAL, and INTEGER values and participate in arithmetic and logical expressions o Built-in functions %LOC, %REF, and %VAL o VOLATILE statement o Bit manipulation functions o Binary, hexadecimal, and octal constants and Z and O format edit descriptors applicable to all data types o I/O unit numbers that can be any nonnegative INTEGER*4 value o Variable amounts of data can be read from and written to "STREAM" files, which contain no record delimiters o ENCODE and DECODE statements o ACCEPT, TYPE, and REWRITE input/output statements o DEFINE FILE, UNLOCK, and DELETE statements o USEROPEN subroutine invocation at file OPEN time o Support for I/O record larger than 2.1 Gigabytes in variable-length unformatted files o Support for reading nondelimited character strings as input for char- acter NAMELIST items o Debug statements in source 5 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Generation of a source listing file with optional machine code rep- resentation of the executable source o Generation of a symbolic assembly language representation of the executable source that can be assembled o Variable format expressions in a FORMAT statement o Optional run-time bounds checking of array subscripts and charac- ter substrings o 31-character identifiers that can include dollar sign ($) and un- derscore (_) o Support for executing in-line assembler code using the ASM intrin- sics o Support for the supercomputer intrinsics POPCNT, POPPAR, LEADZ, TRAILZ, and MULT_HIGH o Language elements that support the various extended range and ex- tended precision floating point architectural features: - 32-bit IEEE S_floating data type, with an 8-bit exponent and 24- bit mantissa, which provides a range of 1.17549435E-38 (normal- ized) to 3.40282347E38 (the IEEE denormalized limit is 1.40129846E- 45) and a precision of typically 7 decimal digits - 64-bit IEEE T_floating data type, with an 11-bit exponent and 53-bit mantissa, which provides a range of 2.2250738585072013D- 308 (normalized) to 1.7976931348623158D308 (the IEEE denormal- ized limit is 4.94065645841246544D-324) and a precision of typ- ically 15 decimal digits - 128-bit IEEE extended Alpha X_floating data type, with a 15-bit exponent and a 113-bit mantissa, which provides a range of ap- proximately 6.48Q-4966 to 1.18Q4932 and a precision of typically 33 decimal digits o Command line control for: - The size of default INTEGER, REAL, and DOUBLE PRECISION data items 6 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 - The levels and types of optimization to be applied to the pro- gram - The directories to search for INCLUDE files - Inclusion or suppression of various compile-time warnings - Inclusion or suppression of run-time checking for various I/O and computational errors - Control over whether compilation terminates after a specific num- ber of errors has been found - Choosing whether executing code will be thread-reentrant o Internal procedures can be passed as actual arguments to procedures o Kind types for all of the hardware-supported data types: - For 1-, 2-, 4-, and 8-byte LOGICAL data: LOGICAL (KIND=1) LOGICAL (KIND=2) LOGICAL (KIND=4) LOGICAL (KIND=8) - For 1-, 2-, 4-, and 8-byte INTEGER data: INTEGER (KIND=1) INTEGER (KIND=2) INTEGER (KIND=4) INTEGER (KIND=8) - For 4-, 8-, and 16-byte REAL data: REAL (KIND=4) REAL (KIND=8) REAL (KIND=16) - For single precision, double precision, and quad-precision COM- PLEX data: COMPLEX (KIND=4) COMPLEX (KIND=8) COMPLEX (KIND=16) 7 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o The following features found in Compaq Visual Fortran: - # Constants-constants using other than base 10 - C Strings-NULL terminated strings - Conditional Compilation And Metacommand Expressions ($define, $undefine, $if, $elseif, $else, $endif) - $FREEFORM, $NOFREEFORM, $FIXEDFORM-source file format - $INTEGER, $REAL-selects size - $FIXEDFORMLINESIZE-line length for fixed form source - $STRICT, $NOSTRICT-F90 conformance - $PACK-structure packing - $ATTRIBUTES ALIAS-external name for a subprogram or common block - $ATTRIBUTES C, STDCALL-calling and naming conventions - $ATTRIBUTES VALUE, REFERENCE-calling conventions - \ Descriptor-prevents writing an end-of-record mark - Ew.dDe and Gw.dDe Edit Descriptors-similar to Ew.dEe and Gw.dEe - $DECLARE and $NODECLARE (same as IMPLICIT NONE) - $ATTRIBUTES EXTERN-variable allocated in another source file - $ATTRIBUTES VARYING-variable number of arguments - $ATTRIBUTES ALLOCATABLE-allocatable array - Mixing Subroutines/Functions in Generic Interfaces - $MESSAGE-output message during compilation - $LINE (same as C's #line) - INT1 converts to one byte integer by truncating - INT2 converts to two byte integer by truncating - INT4 converts to four byte integer by truncating - COTAN returns cotangent 8 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 - DCOTAN returns double precision cotangent - IMAG returns the imaginary part of complex number - IBCHNG reverses value of bit - ISHA shifts arithmetically left or right - ISHC performs a circular shift - ISHL shifts logically left or right o Support for directed decomposition for parallel processing on shared memory multiprocessor systems using source code directives from ei- ther OpenMP (!$OMP) or HP Fortran (!$PAR): - PARALLEL and END PARALLEL directives to define parallel regions - DO and END DO directives to define parallel work constructs - PARALLEL and SECTIONS directives to define parallel work con- structs - PRIVATE and SHARED attributes to describe data local or global to the threads of execution - CRITICAL section directive to define a guarded region where one thread executes at a time - TASK COMMON or THREADPRIVATE directives to allow each thread to have a local copy of a COMMON block - Environment variables to control resource utilization at run- time - Library routines to query and adjust the run-time parallel en- vironment - Nested OpenMP parallel regions - NUM_THREADS clause o A number of High Performance Fortran (HPF) features, including: - The data parallel FORALL statement and construct 9 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 - Execution model procedure prefixes: EXTRINSIC(HPF) EXTRINSIC(HPF_LOCAL) EXTRINSIC(HPF_SERIAL) - PURE procedure prefix to specify a lack of procedure side ef- fects - The following HPF data alignment and distribution directives: ALIGN DISTRIBUTE INHERIT PROCESSORS TEMPLATE SHADOW ON (in conjunction with INDEPENDENT loops) - Many HPF-2 approved extensions, including: * HPF_LOCAL routines, and all HPF_LOCAL_LIBRARY routines ex- cept LOCAL_BLKCNT, LOCAL_LINDEX, and LOCAL_LINDEX but none of the approved extensions to HPF_LOCAL_LIBRARY routines * HPF_SERIAL routines * ON directive within INDEPENDENT loops * RESIDENT directive used in conjunction with INDEPENDENT loops * Mapping of derived type components * Pointers to mapped objects * Shadow width declarations - HPF intrinsic procedures and library routines: * NUMBER_OF_PROCESSORS and PROCESSORS_SHAPE * Reduction functions * Combining scatter functions 10 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 * Parallel prefix and suffix functions * Sorting functions * System inquiry intrinsics * Computational intrinsics * Mapping inquiry subroutines * Bit manipulation functions * Array reduction functions * Array combining scatter functions * Array parallel prefix and suffix functions * Array sorting functions GRADE_UP, GRADE_DOWN - HPF INDEPENDENT directive - HPF SEQUENCE and NOSEQUENCE directives HP Fortran 77 contains the following extensions to the FORTRAN 77 stan- dard: o Support for recursive subprograms o IMPLICIT NONE statements o INCLUDE statement o NAMELIST-directed I/O o DO WHILE and ENDDO statements o Use of exclamation point (!) for end of line comments o Generation of Cross Reference Listings o Support for NTT Technical Requirement TR550001, Multivendor Inte- gration Architecture (MIA) Version 1.1, Division 2, Part 3-2, Pro- gramming Language FORTRAN o Support for automatic arrays 11 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Support for the SELECT CASE - CASE - CASE DEFAULT - END SELECT state- ments o Support for the EXIT and CYCLE statements and for construct names on DO - END DO statements o Reporting of unused and uninitialized variables o Support for DATE_AND_TIME intrinsic for obtaining dates using a four- digit year format HP Fortran provides a multiphase optimizer that is capable of perform- ing optimizations across entire programs. Specific optimizations per- formed by both HP Fortran 95/90 and HP Fortran 77 include: o Constant folding o Optimizations of arithmetic IF, logical IF, and block IF-THEN-ELSE o Global common subexpression elimination o Removal of invariant expressions from loops o Global allocation of general registers across program units o In-line expansion of statement functions and routines o Optimization of array addressing in loops o Value propagation o Deletion of redundant and unreachable code o Loop unrolling o Thorough dependence analysis o Software pipelining to rearrange instructions between different un- rolled loop iterations o Optimized interface to intrinsic functions o Loop transformation optimizations that apply to array references within loops, including: - Loop blocking 12 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 - Loop distribution - Loop fusion - Loop interchange - Loop scalar replacement - Outer loop unrolling o Speculative code scheduling, where a conditionally executed instruc- tion is moved to a position before a test instruction and executed unconditionally. This reduces instruction latency stalls. (Perfor- mance may be reduced somewhat, because the run-time system must dis- miss exceptions caused by speculative instructions.) Specific optimizations performed by HP Fortran 95/90 include: o Array temporary elimination o A number of HPF-specific optimizations, including: - Message vectorization - Nearest-neighbor optimizations for improved communication per- formance of constructs typically seen in PDE solvers - Parallelism of reductions - Run-time preprocessing of loops for improved performance of ir- regular data access code - Many other communication-based optimizations Both HP Fortran 95/90 and HP Fortran 77 are shareable, re-entrant com- pilers that operate under the HP Tru64 UNIX operating system. They glob- ally optimize source programs while taking advantage of the native in- struction set and the HP Tru64 UNIX virtual memory system. 13 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 COMPAQ EXTENDED MATH LIBRARY (CXML) Compaq Extended Math Library (CXML) is a set of mathematical subpro- grams that are optimized for HP architectures. Included subprograms cover the areas of: o Basic Linear Algebra o Linear System and Eigenproblem Solvers o Sparse Linear System Solvers o Sorting o Random Number Generation o Signal Processing The Basic Linear Algebra library includes the industry-standard Ba- sic Linear Algebra Subprograms (BLAS) Level 1, Level 2, and Level 3. Also included are subprograms for BLAS Level 1 Extensions, Sparse BLAS Level 1, and Array Math Functions (VLIB). The Linear System and Eigenproblem Solver library provides the com- plete LAPACK V3 package developed by a consortium of university and government laboratories. LAPACK is an industry-standard subprogram pack- age offering an extensive set of linear system and eigenproblem solvers. LAPACK uses blocked algorithms that are better suited to most modern architectures, particularly ones with memory hierarchies. LAPACK will supersede LINPACK and EISPACK for most users. The Sparse Linear System library provides both direct and iterative sparse linear system solvers. The direct solver package supports sym- metric and symmetrically structured matrices with real or complex el- ements. The iterative solver package contains a basic set of storage schemes, preconditioners, and iterative solvers. The design of this package is modular and matrix-free, allowing future expansion and easy modification by users. 14 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 The Signal Processing library provides a basic set of signal process- ing functions. Included are one-, two-, and three-dimensional Fast Fourier Transforms (FFT), group FFTs, Cosine/Sine Transforms (FCT/FST), Con- volution, Correlation, and Digital Filters. Many CXML subprograms are optimized for the supported hardware plat- forms. Optimization techniques include traditional optimizations such as loop unrolling and loop reordering. CXML subprograms also provide efficient management of the hierarchical memory system, using tech- niques such as the following: o Reuse of data within registers to minimize memory accesses o Efficient cache management o Use of blocked algorithms that minimize translation buffer misses and unnecessary paging Since CXML routines can be called from all languages that support the HP Tru64 UNIX calling standard, the library provides optimized com- putation for applications written in these languages. Where appropri- ate, most subprograms are available in both real and complex versions, as well as in both single and double precision. CXML for HP Tru64 Al- pha supports IEEE floating-point format. Parallel Library Support for Symmetric Multiprocessing CXML also supports symmetric multiprocessing (SMP) for improved per- formance. Key BLAS Level 2 and 3 routines have been modified to ex- ecute in parallel if run on SMP hardware. Also modified for this pur- pose are: o LAPACK GETRF and POTRF routines o Sparse iterative solvers o Direct sparse solvers o FFT routines 15 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 These parallel routines along with the other serial routines are sup- plied in an alternative library. The user can choose to link with ei- ther the parallel or the serial library, depending on whether SMP sup- port is required, since each library contains the complete set of rou- tines. Basic Linear Algebra Subprograms Linear algebra operations are fundamental to many mathematical appli- cations, and several libraries of linear algebra subprograms exist through- out the computer industry. The CXML BLAS library contains the most com- monly used linear algebra subprograms. The CXML linear algebra library contains five groups of subprograms at three levels: o Basic Linear Algebra Subprograms (BLAS) Level 1 o BLAS Level 1 Extensions o BLAS Level 1 Sparse Extensions o BLAS Level 2 o BLAS Level 3 BLAS Level 1 (Scalar/Vector and Vector/Vector Operations) BLAS Level 1 provides a set of elementary vector functions, operat- ing on one or two vectors. These are typically very small routines, and they make less efficient use of the computing resources of mod- ern computer architectures than the Level 2 and 3 operations. CXML provides the 15 standard BLAS Level 1 operations: o The index of the element of a vector having maximum absolute value o The sum of the absolute values of the elements of a vector o Inner product of two real vectors 16 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Scalar plus the extended precision inner product of two real vec- tors o Conjugated inner product of two complex vectors o Unconjugated inner product of two complex vectors o Square root of the sum of squares (norm) of the elements of a vec- tor o Scalar times a vector plus a vector o Copy one vector to another o Apply a Givens rotation o Apply a modified Givens plane rotation o Generate elements for a Givens plane rotation o Generate elements for a modified Givens plane rotation o Product of a vector times a scalar o Swap the elements of two vectors BLAS Level 1 Extensions (Vector/Vector Operations) When developing mathematical algorithms using the BLAS Level 1, sci- entists and engineers found that several additional constructs were used on a regular basis. These constructs are well known throughout the computer industry as BLAS Level 1 Extensions. CXML contains 13 BLAS Level 1 Extension operations: o Index of element having the minimum absolute value o Index of element having the maximum value o Index of element having the minimum value o Largest value of the elements of a vector o Smallest value of the elements of a vector o Largest absolute value of the elements of a vector 17 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Smallest absolute value of the elements of a vector o Sum of the values of the elements of a vector o Set all elements of a vector equal to a scalar o Constant times a vector set to another vector (y = a x) o Euclidean norm with no intermediate scaling o Sum of the squares of the elements of a vector o Constant times a vector plus a vector set to another vector (z = a x + y) BLAS Level 1 Sparse Extensions (Vector/Vector Operations) This group of operations is similar to the BLAS Level 1 routines, but is designed to work on sparse vectors (vectors in which most of the elements are zero). Six of the routines are from industry standard Sparse BLAS 1, and the remaining three are enhancements. The nine sparse BLAS Level 1 operations are: o Scalar times a sparse vector plus a vector o Sum of a sparse vector and a full vector o Inner product of a sparse vector and a full vector o Gather a sparse vector from a full vector o Gather a sparse vector from the scaled elements of a full vector o Gather a sparse vector from a full vector and zero corresponding elements of full vector o Apply Givens rotation to a sparse vector and a full vector o Scatter a sparse vector into a full vector o Scale and scatter a sparse vector into a full vector 18 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 BLAS Level 2 (Matrix/Vector Operations) The BLAS Level 2 codes make more effective use of the data in the reg- isters, reducing the number of register loads and stores required. In addition, loop unrolling techniques are used to minimize cache misses and page faults. The BLAS Level 2 subprograms use the following types of operations: o Matrix/vector products o Rank-1 and rank-2 matrix updates o Solutions of triangular systems of equations Six types of matrices are supported by these BLAS Level 2 routines: o General o General band o Symmetric/Hermitian o Symmetric/Hermitian band o Triangular o Triangular band BLAS Level 3 (Matrix/Matrix Operations) The BLAS Level 3 routines operate at a level that makes the most ef- ficient use of machine resources. CXML optimizes these routines by par- titioning matrices into blocks and computing matrix/matrix operations on each block. This approach avoids excessive memory accesses by pro- viding full reuse of data while each block is in the cache or the reg- isters. BLAS Level 3 routines provide this kind of blocking for three basic types of operations: o Matrix/matrix products o Rank-k and rank-2k updates of a symmetric matrix 19 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Solving triangular systems of equations with multiple right-hand sides Three types of matrices are supported by these BLAS Level 3 routines: o General o Symmetric/Hermitian o Triangular A set of additional matrix-matrix routines is provided: o Add two matrices o Subtract one matrix from another o Transpose a matrix, in-place or out-of-place Array Math Functions The Array Math Functions provide a set of basic math functions that operate on arrays of numbers rather than on scalars. On vector and su- perscalar architectures, such functions have a performance advantage over a loop of scalar operations. The library includes the following array functions for double precision numbers: o Sine of array o Cosine of array o Cosine and sine of array o Exponent of array o Logarithm of array o Square root of array o Reciprocal of array 20 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 LAPACK Library Contents LAPACK is a library of linear algebra subprograms intended to solve a wide range of problems, including dense systems of linear equations, linear least squares problems, eigenvalue problems, and singular value problems. It is also useful in doing computations such as matrix fac- torizations and estimations of condition numbers. The CXML LAPACK library provides the complete LAPACK V3 package, com- piled, tested, and ready to use. Combined with the optimized BLAS Level 3 routines, the CXML LAPACK provides optimal performance on all sup- ported platforms. LAPACK should be used in place of LINPACK and EIS- PACK, because it is more efficient, accurate, and robust. LAPACK supports both real and complex, single and double precision data. It operates on the following types of matrices: o Bidiagonal o General band o General unsymmetric o General tridiagonal o Hermitian o Hermitian, packed storage o Upper Hessenberg, generalized problem o Upper Hessenberg o Orthogonal o Orthogonal, packed storage o Symmetric/Hermitian positive definite band o Symmetric/Hermitian positive definite o Symmetric/Hermitian positive definite, packed storage o Symmetric/Hermitian positive definite tridiagonal o Symmetric band 21 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Symmetric, packed storage o Symmetric tridiagonal o Symmetric o Triangular band o Triangular, generalized problem o Triangular, packed storage o Triangular o Trapezoidal o Unitary o Unitary, packed storage LAPACK provides the following operations: o Triangular factorization o Unblocked triangular factorization o Solve a system of linear equations (based on triangular factoriza- tion) o Compute the inverse (based on triangular factorization) o Compute a split Cholesky factorization of a symmetric/Hermitian pos- itive definite band matrix o Unblocked computation of inverse o Estimate condition number o Refine initial solution returned by solver o Perform QR factorization without pivoting o Unblocked QR factorization o Solve linear least squares problem (based on QR factorization) o Solve the linear equality constrained least squares (LSE) problem o Solve the Gauss-Markov linear model problem 22 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Perform LQ factorization without pivoting o Unblocked LQ factorization o Solve underdetermined linear system (based on LQ factorization) o Generate a real orthogonal or complex unitary matrix as a product of Householder matrices o Unblocked generation of real orthogonal or unitary matrix o Multiply a matrix by a real orthogonal or complex unitary matrix by applying a product of Householder matrices o Unblocked version of multiplication of a matrix by a real orthog- onal or complex unitary matrix by applying a product of Householder matrices o Reduce a square matrix to upper Hessenberg form o Unblocked version of square matrix reduction o Reduce a symmetric matrix to real symmetric tridiagonal form o Reduce a band matrix to bidiagonal form o Unblocked version of symmetric matrix reduction o Reduce a rectangular matrix to bidiagonal form o Reduce a band symmetric/Hermitian matrix to tridiagonal form o Reduce a symmetric/Hermitian-definite banded generalized eigenprob- lem to standard form o Compute various norms of a complex Hermitian tridiagonal matrix o Compute eigenvalues and optional Schur factorization or eigenvec- tors using QR algorithm o Compute selected eigenvectors by inverse iteration o Compute eigenvectors from Schur factorization o Compute eigenvectors using the Pal-Walker-Kahan variant of the QL or QR algorithm 23 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o For a pair of N-by-N real nonsymmetric matrices, compute the gen- eralized eigenvalues, the real Schur form, the left and/or right Schur vectors, and the left and/or right generalized eigenvectors o Solve the generalized nonsymmetric eigenproblem Ax = lambda Bx o Solve the generalized definite banded eigenproblem Ax = lambda Bx o Solve the generalized symmetric/Hermitian-definite banded eigen- problem o Solve the symmetric eigenproblem using divide-and-conquer algorithm o Compute singular values and, optionally, singular vectors using the QR algorithm o Compute the generalized (quotient) singular value decomposition o Compute the generalized singular value decomposition (GSVD) on the M-by-N matrix A and P-by-N matrix B o Solve a generalized linear regression model problem Sparse System Solver Subprograms The CXML Sparse System Solver library contains a set of subprograms that can be used to solve sparse linear systems of equations. Two pack- ages providing direct and iterative methods are supported. Direct Method Sparse Solver Package: The direct solver package includes routines to solve symmetric and sym- metrically structured matrices with real or complex elements. For sym- metric matrices, these routines can solve both positive definite and indefinite systems. The direct solver package routines use a row major, upper triangular storage format. Routines are provided to do the following: o Initialize the solver o Define the structure of the matrix 24 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Reorder the matrix o Factor the matrix o Compute the solution vector of the system o Return statistics about the phases of the solving process The package permits the factorization of a matrix to be used to com- pute solutions for additional right-hand sides, and for the reorder- ing of a matrix to be used to solve additional systems with the same structure but different values for the matrix elements. Iterative Method Sparse Solver Package: For the iterative method, the library provides a modular set of stor- age schemes, preconditioners, and solvers. These solvers and precon- ditioners are easily accessed through an integrated driver routine. Six iterative sparse solvers for real, double precision data are sup- plied: o Preconditioned conjugate gradient method o Preconditioned least squares conjugate gradient method o Preconditioned biconjugate method o Preconditioned conjugate gradient squared method o Preconditioned generalized minimum residual method o Preconditioned transpose free QMR method Routines for three storage schemes are provided, or the user can de- velop routines to employ a custom storage scheme. The supplied stor- age schemes include: o Symmetric diagonal o Unsymmetric diagonal o General storage by rows 25 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 Three preconditioners are supplied, which can be selectively applied to the data. Users can also supply custom preconditioners. The pre- conditioners supplied include: o Diagonal o Polynomial (Neumann) o Incomplete LU with zero diagonals added Sorting Subprograms Two sort subprograms using the Quicksort algorithm and two general pur- pose radix sort subprograms are provided, as follows: o Sort elements of a vector using the Quicksort algorithm o Sort an indexed vector of data using the Quicksort algorithm o Sort data using a radix sort algorithm o Sort an indexed vector of data using a radix sort algorithm All of the above sorts operate on data stored in memory. Random Number Subprograms CXML provides four random number generator subprograms: o Produce a vector of uniform [0,1], long-period random numbers us- ing the L'Ecuyer multiplicative method. Two auxiliary input rou- tines are provided to allow this subprogram to be called from within a parallel section of a program. o Produce a vector of N(0,1), normally-distributed random numbers. Two auxiliary input routines are provided to allow this subprogram to be called from within a parallel section of a program. o Produce single precision random numbers using a linear multiplica- tive algorithm 26 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o Produce single precision random numbers using a Lehmer multiplica- tive generator Signal Processing Subprograms The CXML Signal Processing library contains a set of subprograms in four basic areas of signal processing: o Fast Fourier Transforms (FFT) o Fast Cosine and Fast Sine Transforms (FCT and FST) o Convolution and correlation o Digital filters Fast Fourier Transforms and Cosine and Sine Transforms CXML provides one-dimensional, two-dimensional, three-dimensional, and group FFT routines and one-dimensional FCT/FST routines. Each routine is supplied in two forms: o The first form computes the transform in one unit operation. This is convenient for programs requiring speed on only one or a few op- erations. o The second form is provided for programs requiring speed on repeated operations. With this form, each routine is subdivided into three routines. One routine builds the rotation factors, a second rou- tine applies them to perform the transform, and a third routine deal- locates any virtual memory allocated in the first routine. Thus, for repeated operations, the rotation factors need to be built only once. Convolution and Correlation CXML provides routines for computing one-dimensional discrete convo- lutions and correlations. These routines can process both periodic and nonperiodic data. 27 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 Digital Filters CXML provides support for one-dimensional, nonrecursive digital fil- tering. Based on the Kaisers Sinh-Bessel algorithm, these routines al- low programming of bandpass, bandstop, low-pass, and high-pass fil- ters. Cray SciLib Portability Support SCIPORT is an HP implementation of V7 of the Cray Research scientific numerical library, SciLib. SCIPORT provides 64 bit single-precision and 64-bit integer interfaces to underlying CXML routines for Cray users porting programs to Alpha systems running HP Tru64 UNIX. SCIPORT also provides equivalent versions of almost all Cray Math Library and CF77 (Cray Fortran 77) Math intrinsic routines. In order to be completely source code compatible with SciLib, the SCI- PORT library calling sequence supports 64-bit integers passed by ref- erence. However, internally, SCIPORT uses 32 bit integers. Consequently, some run-time uses of SciLib may not be supported by SCIPORT. SCIPORT provides the following: o 64-bit versions of all Cray SciLib single-precision BLAS Level 1, Level 2, and Level 3 routines o All Cray SciLib LAPACK routines o All Cray SciLib Special Linear System Solver routines o All Cray SciLib Signal Processing routines o All Cray SciLib Sorting and Searching routines These routines are completely interchangeable with their Cray SciLib counterparts up to the runtime limit on integer size, and with the ex- ception of the ORDERS routine, require no program changes to function correctly. Owing to endian differences of machine architecture, spe- cial considerations must be given when the ORDERS routine is used to sort multibyte character strings. 28 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 Run-Time Library Redistribution The HP Fortran kit may include updated Run-Time Library shareable im- ages. HP grants the user a nonexclusive royalty-free worldwide right to reproduce and distribute the executable version of the Run-Time Li- brary (the "RTLs"), provided that the user does all of the following: o Distributes the RTLs only in conjunction with and as a part of the user's software application product that is designed to operate in the HP Tru64 UNIX environment. o Does not use the name, logo, or trademarks of HP to market the user's software application product. o Includes the copyright notice of HP Fortran on the user's product disk label and/or on the title page of the documentation for soft- ware application product. o Agrees to indemnify, hold harmless, and defend HP from and against any claims or lawsuits, including attorney's fees, that arise or result from the use or distribution of the software application prod- uct. Except as expressly provided herein, HP grants no implied or express license under any of its patents, copyrights, trade secrets, trade- marks, or any license or other proprietary interests and rights. For HP Tru64 UNIX Alpha systems, the RTL images are designated as: o libfor.a, libfor.so o libUfor.a, libUfor.so o libFutil.a, libFutil.so o libshpf.a, libshpf.so (HP Fortran 95/90 only) o libhpfcmpi.a, libhpfcmpi.so (HP Fortran 95/90 only) o libhpfsmpi.a, libhpfsmpi.so (HP Fortran 95/90 only) o for_msg.cat o libcxml_ev4.a, libcxml_ev4.so 29 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o libcxml_ev5.a, libcxml_ev5.so o libcxml_ev6.a, libcxml_ev6.so o libcxmlp_ev4.a, libcxmlp_ev4.so (HP Fortran 95/90 only) o libcxmlp_ev5.a, libcxmlp_ev5.so (HP Fortran 95/90 only) o libcxmlp_ev6.a, libcxmlp_ev6.so (HP Fortran 95/90 only) HARDWARE REQUIREMENTS Processors Supported: Any Alpha system that is capable of running HP Tru64 UNIX Version 4.0 or newer. Disk Space Requirements: Disk space required for HP Fortran installation: Root file system: / 0 MB Other file systems: /usr 71 MB /tmp 1 MB /var 0 MB Disk space required for HP Fortran use (permanent): Root file system: /0 MB Other file systems: /usr 67 MB /var 0 MB To install the Compaq Extended Math Library (CXML), an additional 75 MB (or more) is needed in the /usr file system. These sizes are approximate; actual sizes may vary depending on the user's system environment, configuration, and software options. Memory Requirements: For systems used to compile a program for parallel execution with the -hpf flag, a minimum of 64 MB of physical memory is recommended. 30 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 SOFTWARE REQUIREMENTS o HP Tru64 UNIX Operating System Version 4.0, 5.0A, or 5.1 (SPD 41.61.xx) o HP Tru64 UNIX Developers' Toolkit Version 4.0, 5.0A, or 5.1A (SPD 44.36.xx) for HP Tru64 UNIX Version 4.0, 5.0A, or 5.1A, respectively. SOFTWARE LICENSING INFORMATION This software is furnished only under license. For more information about licensing terms and policies of HP, contact your local HP of- fice. LICENSE MANAGEMENT FACILITY SUPPORT HP Fortran supports the License Management Facility of HP. License units for HP Fortran are allocated on an Unlimited System Use plus Concurrent Use basis. Each Concurrent Use license allows any one individual at a time to use the layered product. For more information on the License Management Facility, refer to the HP Tru64 UNIX Operating System Software Product Description (SPD 41.61.xx) or to the HP Tru64 UNIX Operating System documentation set. OPTIONAL SOFTWARE o Compaq MPI (see /techservers/software/cmpi.html at the http://www.hp.com site) is required to link and execute High Performance Fortran pro- grams compiled for parallel execution using the -hpf option, for shared-memory or Memory Channel Cluster target machines. For Al- phaServer SC target machines, no additional software is required. o Compaq Kap Fortran/OpenMP V4.4 for HP Tru64 UNIX (for HP Fortran 95/90 compiler; see SPD 45.72.13). 31 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 o The use of directed parallel processing (such as the OpenMP direc- tives) requires HP Tru64 UNIX Version 4.0D or higher. GROWTH CONSIDERATIONS The minimum hardware/software requirements for any future version of this product may be different from the requirements for the current version. DISTRIBUTION MEDIA Media and documentation for HP Fortran are available on the HP Soft- ware Product Library CD-ROM set for HP Tru64 UNIX Products (QA-054AA- H8) or a CD-ROM containing only the HP Fortran for HP Tru64 UNIX Al- pha Systems product (QA-MV2AA-H8). Documentation in printed format can be ordered separately (see the HP Fortran "read before installing" cover letter or the online release notes). SOFTWARE WARRANTY This software is provided by HP with a 90-day conformance warranty in accordance with the HP warranty terms applicable to the license pur- chase. The above information is valid at time of release. Please contact your local HP office for the most up-to-date information. ORDERING INFORMATION Software Licenses: Unlimited System Use: QL-MV2A*-AA Personal Use: QL-MV2AM-2B Concurrent Use: QL-MV2AM-3B Concurrent 5 Pack: QL-MV2AM-3C Concurrent 10 Pack: QL-MV2AM-3D 32 HP Fortran Version 5.5A for Tru64 UNIX Alpha Systems SPD 37.54.16 Software Documentation: HP Fortran 95/90 Documentation: QA-MV2AA-GZ HP Fortran 77 Documentation: QA-MV2AB-GZ Software Product Services: QT-MV2*-** * Denotes variant fields. For additional information on available li- censes, services, and media, refer to the appropriate price book. A variety of service options are available from HP. For more infor- mation, contact your local HP office. COPYRIGHT INFORMATION Copyright © 2003 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional war- ranty. HP shall not be liable for technical or editorial errors or omis- sions contained herein. Proprietary computer software. Valid license from HP required for pos- session, use or copying. Consistent with FAR 12.211 and 12.212, Com- mercial Computer Software, Computer Software Documentation, and Tech- nical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. 33