HP OpenVMS Systems Documentation

OpenVMS VAX RTL Mathematics (MTH$) Manual

2.3 Vector Versions of Existing Scalar Routines

Vector forms of many MTH$ routines are provided to support vectorized compiled applications. Vector versions of key F-floating, D-floating, and G-floating scalar routines employ vector hardware, while maintaining identical results with their scalar counterparts. Many of the scalar algorithms have been redesigned to ensure identical results and good performance for both the vector and scalar versions of each routine. All vectorized routines return bit-for-bit identical results as the scalar versions.

You can call the vector MTH$ routines directly if your program is written in VAX MACRO. If you are a Fortran programmer, specify the Fortran intrinsic function name only. The Fortran compiler will then determine whether the vector or scalar version of a routine should be used.

2.3.1 Exceptions

You should not attempt to recover from an MTH$ vector exception. After an MTH$ vector exception, the vector routines cannot continue execution, and nonexceptional values might not have been computed.

2.3.2 Underflow Detection

In general, if a vector instruction results in the detection of both a floating overflow and a floating underflow, only the overflow will be signaled.

Some scalar routines check to see if a user has enabled underflow detection. For each of those scalar routines, there are two corresponding vector routines: one that always enables underflow checking and one that never enables underflow checking. (In the latter case, underflows produce a result of zero.) The Fortran compiler always chooses the vector version that does not signal underflows, unless the user specifies the /CHECK=UNDERFLOW qualifier. This ensures that the check is performed but does not impair vector performance for those not interested in underflow detection.

2.3.3 Vector Routine Name Format

Use one of the formats in Table 2-3 to call (from VAX MACRO) a vector math routine that enables underflow signaling. (The E in the routine name means enabled underflow signaling.)

**Table 2-3 Vector Routine Format --- Underflow Signaling Enabled**
Format	Type of Routine
MTH$Vx SAMPLE_E_Ry_Vz	Real valued math routine
MTH$VCx SAMPLE_E_Ry_Vz	Complex valued math routine
OTS$ SAMPLEq_E_Ry_Vz	Power routine or complex multiply and divide

Use one of the formats in Table 2-4 to call (from VAX MACRO) a vector math routine that does not enable underflow signaling.

**Table 2-4 Vector Routine Format --- Underflow Signaling Disabled**
Format	Type of Routine
MTH$Vx SAMPLE_Ry_Vz	Real valued math routine
MTH$VCx SAMPLE_Ry_Vz	Complex valued math routine
OTS$ SAMPLEq_Ry_Vz	Power routine or complex multiply and divide

In the preceding formats, the following conventions are used:

The letter A (or blank) for F-floating, D for D-floating, G for G-floating.

A number between 0 and 11 (inclusive). R y means that the scalar registers R0 through R y will be used by the routine SAMPLE. You must save these registers.

A number between 0 and 15 (inclusive). V z means that the vector registers V0 through V z will be used by the routine SAMPLE. You must save these registers.

Two letters denoting the base and power data type, as follows:

	RR	F-floating base raised to an F-floating power
	RJ	F-floating base raised to a longword power
	DD	D-floating base raised to a D-floating power
	DJ	D-floating base raised to a longword power
	GG	G-floating base raised to a G-floating power
	GJ	G-floating base raised to a longword power
	JJ	Longword base raised to a longword power

2.3.4 Calling a Vector Math Routine

You can call the vector MTH$ routines directly if your program is written in VAX MACRO.

Note

If you are a Compaq Fortran programmer, do not specify the MTH$ vector routines explicitly. Specify the Fortran intrinsic function name only. The Fortran compiler determines whether the vector or scalar version of a routine should be used.

In the following examples, keep in mind that vector real arguments are passed in V0, V1, and so on, and vector real results are returned in V0. On the other hand, vector complex arguments are passed in V0 and V1, V2, and V3, and so on. Vector complex results are returned in V0 and V1.

Argument	Argument Passed Register	Results Returned Register
Vector real arguments	V0, V1,...	V0
Vector complex arguments	V0 and V1, V2 and V3,...	V0 and V1

Example 1

The following example shows how to call the vector version of MTH$EXP. Assume that you do not want underflows to be signaled, and you need to use the current contents of all vector and scalar registers after the invocation. Before you can call the vector routine from VAX MACRO, perform the following steps.

Find EXP in the column of scalar names in Appendix B to determine:
- The full vector routine name: MTH$VEXP_R3_V6
- How the routine is invoked (CALL or JSB): JSB
- The scalar registers that must be saved: R0 through R3 (as specified by R3 in MTH$VEXP_R3_V6)
- The vector registers that must be saved: V0 through V6 (as specified by V6 in MTH$VEXP_R3_V6)
- The vector registers used to hold the input arguments: V0
- The vector registers used to hold the output arguments: V0
- If there is a vector version that signals underflow (not needed in this example)
Save the scalar registers R0, R1, R2, and R3.
Save the vector registers V0, V1, V2, V3, V4, V5, and V6.
Save the vector mask register VMR.
Save the vector count register VCR.
Load the vector length register VLR.
Load the vector register V0 with the argument for MTH$EXP.
JSB to MTH$VEXP_R3_V6.
Store result in memory.
Restore all scalar and vector registers except for V0. (The results of the call to MTH$VEXP_R3_V6 are stored in V0.)

The following MACRO program fragment shows this example. Assume that:

V0 through V6 and R0 through R3 have been saved.
R4 points to a vector of 60 input values.
R6 points to the location where the results of MTH$VEXP_R3_V6 will be stored.
R5 contains the stride in bytes.

Note that MTH$VEXP_R3_V6 denotes an F-floating data type because there is no letter between V and E in the routine name. (For further explanation, refer to Section 2.3.3.) The stride (the number of array elements that are skipped) must be a multiple of 4 because each F-floating value requires 4 bytes.

MTVLR   #60                 ; Load VLR
MOVL    #4, R5              ; Stride
VLDL    (R4), R5, V0        ; Load V0 with the actual arguments
JSB     G^MTH$VEXP_R3_V6    ; JSB to MTH$VEXP
VSTL    V0, (R6), R5        ; Store the results

Example 2

The following example demonstrates how to call the vector version of OTS$POWDD with a vector base raised to a scalar power. Before you can call the vector routine from VAX MACRO, perform the following steps.

Find POWDD (V^S) in the column of scalar names in Appendix B to determine:
- The full vector routine name: OTS$VPOWDD_R1_V8
- How the routine is invoked (CALL or JSB): CALL
- The scalar registers that must be saved: R0 through R1 (as specified by R1 in OTS$VPOWDD_R1_V8)
- The vector registers that must be saved: V0 through V8 (as specified by V8 in OTS$VPOWDD_R1_V8)
- The vector registers used to hold the input arguments: V0, R0
- The vector registers used to hold the output arguments: V0
- If there is a vector version that signals underflow (not needed in this example)
Save the scalar registers R0 and R1.
Save the vector registers V0, V1, V2, V3, V4, V5, V6, V7, and V8.
Save the vector mask register VMR.
Save the vector count register VCR.
Load the vector length register VLR.
Load the vector register V0 and the scalar register R0 with the arguments for OTS$POWDD.
Call OTS$VPOWDD_R1_V8.
Store result in memory.
Restore all scalar and vector registers except for V0. (The results of the call to OTS$VPOWDD_R1_V8 are stored in V0.)

The following MACRO program fragment shows how to call OTS$VPOWDD_R1_V8 to compute the result of raising 60 values to the power P. Assume that:

V0 through V8 and R0 and R1 have been saved.
R4 points to the vector of 60 input base values.
R0 and R1 contain the D-floating value P.
R6 points to the location where the results will be stored.
R5 contains the stride.

Note that OTS$VPOWDD_R1_V8 raises a D-floating base to a D-floating power, which you determine from the DD in the routine name. (For further explanation, refer to Section 2.3.3.) The stride (the number of array elements that are skipped) must be a multiple of 8 because each D-floating value requires 8 bytes.

                              ; R0/R1 already contains the power
MTVLR   #60                   ; Load VLR
MOVL    #8, R5                ; Stride
VLDQ    (R4), R5, V0          ; Load V0 with the actual arguments
CALLS   #0,G^OTS$VPOWDD_R1_V8 ; CALL OTS$VPOWDD
VSTQ    V0, (R6), R5          ; Store the results

2.4 Fast-Vector Math Routines

This section describes the fast-vector math routines that offer significantly higher performance at the cost of slightly reduced accuracy when compared with corresponding standard vector math routines. Also note that some fast-vector math routines have restricted argument domains.

When you specify the compile command qualifiers /VECTOR and /MATH_LIBRARY=FAST, the Compaq Fortran compiler selects the appropriate fast-vector math routine, if one exists. The default is /MATH_LIBRARY=ACCURATE. You must specify the /G_FLOATING compile qualifier in conjunction with the /MATH_LIBRARY=FAST and /VECTOR qualifiers to access the G_floating routines.

You can call these routines from VAX MACRO using the standard calling method. The math function names, together with corresponding entry points of the fast-vector math routines, are listed in Table 2-5.

**Table 2-5 Fast-Vector Math Routines**
Function Name	Data Type	Call or JSB	Vector Input Registers	Vector Output Registers	Vector Name (Underflows Not Signaled)
ATAN	F_floating	JSB	V0	V0	MTH$VYATAN_R0_V3
DATAN	D_floating	JSB	V0	V0	MTH$VYDATAN_R0_V5
GATAN	G_floating	JSB	V0	V0	MTH$VYGATAN_R0_V5
ATAN2	F_floating	JSB	V0, V1	V0	MTH$VVYATAN2_R0_V5
DATAN2	D_floating	JSB	V0, V1	V0	MTH$VVYDATAN2_R0_V5
GATAN2	G_floating	JSB	V0, V1	V0	MTH$VVYGATAN2_R0_V5
COS	F_floating	JSB	V0	V0	MTH$VYCOS_R0_V3
DCOS	D_floating	JSB	V0	V0	MTH$VYDCOS_R0_V3
GCOS	G_floating	JSB	V0	V0	MTH$VYGCOS_R0_V3
EXP	F_floating	JSB	V0	V0	MTH$VYEXP_R0_V4
DEXP	D_floating	JSB	V0	V0	MTH$VYDEXP_R0_V6
GEXP	G_floating	JSB	V0	V0	MTH$VYGEXP_R0_V6
LOG	F_floating	JSB	V0	V0	MTH$VYALOG_R0_V5
DLOG	D_floating	JSB	V0	V0	MTH$VYDLOG_R0_V5
GLOG	G_floating	JSB	V0	V0	MTH$VYGLOG_R0_V5
LOG10	F_floating	JSB	V0	V0	MTH$VYALOG10_R0_V5
DLOG10	D_floating	JSB	V0	V0	MTH$VYDLOG10_R0_V5
GLOG10	G_floating	JSB	V0	V0	MTH$VYGLOG10_R0_V5
SIN	F_floating	JSB	V0	V0	MTH$VYSIN_R0_V3
DSIN	D_floating	JSB	V0	V0	MTH$VYDSIN_R0_V3
GSIN	G_floating	JSB	V0	V0	MTH$VYGSIN_R0_V3
SQRT	F_floating	JSB	V0	V0	MTH$VYSQRT_R0_V4
DSQRT	D_floating	JSB	V0	V0	MTH$VYDSQRT_R0_V4
GSQRT	G_floating	JSB	V0	V0	MTH$VYGSQRT_R0_V4
TAN	F_floating	JSB	V0	V0	MTH$VYTAN_R0_V3
DTAN	D_floating	JSB	V0	V0	MTH$VYDTAN_R0_V3
GTAN	G_floating	JSB	V0	V0	MTH$VYGTAN_R0_V3
POWRR(X**Y)	F_floating	CALL	V0, R0	V0	OTS$VYPOWRR_R1_V4
POWDD(X**Y)	D_floating	CALL	V0, R0	V0	OTS$VYPOWDD_R1_V8
POWGG(X**Y)	G_floating	CALL	V0, R0	V0	OTS$VYPOWGG_R1_V9

2.4.1 Exception Handling

The fast-vector math routines signal all errors except floating underflow. No intermediate calculations result in exceptions. To optimize performance, the following message signals all errors:

%SYSTEM-F-VARITH, vector arithmetic fault

2.4.2 Special Restrictions On Input Arguments

The special restrictions listed in Table 2-6 apply only to fast-vector routines SIN, COS, and TAN. The standard vector routines handle the full range of VAX floating-point numbers.

**Table 2-6 Input Argument Restrictions**
Function Name	Input Argument Domain (in Radians)
SIN	~( -6746518783.0, 6746518783.0)
COS	~( -6746518783.0, 6746518783.0)
TAN	~( -3373259391.5, 3373259391.5)

If the application program uses arguments outside of the listed domain, the routine returns the following error message:

%SYSTEM-F-VARITH, vector arithmetic fault

If the application requires argument values beyond the listed limits, use the corresponding standard vector math routine.

2.4.3 Accuracy

The fast-vector math routines do not guarantee the same results as those obtained with the corresponding standard vector math routines. Calls to the fast-vector routines generally yield results that are different from the scalar and original vector MTH$ library routines. The typical maximum error is a 2-LSB (Least Significant Bit) error for the F_floating routines and a 4-LSB error for the D_floating and G_floating routines. This generally corresponds to a difference in the 6th significant decimal digit for the F_floating routines, the 15th digit for D_floating, and the 14th digit for G_floating.

2.4.4 Performance

The fast-vector math routines generally provide performance improvements over the standard vector routines ranging from 15 to 300 percent, depending on the routines called and input arguments to the routines. The overall performance improvement using fast-vector math routines in a typical user application will increase, but not at the same level as the routines themselves. You should do performance and correctness testing of your application using both the fast-vector and the standard vector math routines before deciding which to use for your application.

Contents

Index