Compaq Fortran
User Manual for
Tru64 UNIX and
Linux Alpha Systems

Appendix D
Parallel Library Routines

Note

This appendix applies only to Compaq Fortran on Tru64 UNIX systems.

This appendix contains the following sections:

This appendix summarizes the library routines available for use with directed parallel decomposition requested by the -mp and -omp compiler options.

Where applicable, new applications should call run-time parallel library routines using the OpenMP Fortran API format. (See Section D.1, OpenMP Fortran API Run-Time Library Routines.) For compatibility with existing programs, the Compaq Fortran compiler recognizes equivalent routines of the formats described in Section D.2. Thus, for example, if your program calls _OtsGetNumThreads , the Compaq Fortran compiler interprets that as a call to omp_get_num_threads .

D.1 OpenMP Fortran API Run-Time Library Routines

This section describes:

Library routines that control and query the parallel execution environment
General-purpose lock routines supported by Compaq Fortran.

Table D-1 lists the supported OpenMP Fortran API run-time library routines. These routines are all external procedures.

Table D-1 OpenMP Fortran API Run-Time Library Routines
Routine Name Usage

Library Routines That Control and Query the Parallel Execution Environment

omp_get_dynamic Inform if dynamic thread adjustment is enabled. See Section D.1.1.1, omp_get_dynamic.

omp_get_max_threads Get the maximum value that can be returned by calls to the omp_get_num_threads() function. See Section D.1.1.2, omp_get_max_threads.

omp_get_nested Inform if nested parallelism is enabled. See Section D.1.1.3, omp_get_nested.

omp_get_num_procs Get the number of processors that are available to the program. See Section D.1.1.4, omp_get_num_procs.

omp_get_num_threads Get the number of threads currently in the team executing the parallel region from which the routine is called. See Section D.1.1.5, omp_get_num_threads.

omp_get_thread_num Get the thread number, within the team, in the range from zero to omp_get_num_threads() --1. See Section D.1.1.6, omp_get_thread_num.

omp_in_parallel Inform whether or not a region is executing in parallel. See Section D.1.1.7, omp_in_parallel.

omp_set_dynamic Enable or disable dynamic adjustment of the number of threads available for execution of parallel regions. See Section D.1.1.8, omp_set_dynamic.

omp_set_nested Enable or disable nested parallelism. See Section D.1.1.9, omp_set_nested.

omp_set_num_threads Set the number of threads to use for the next parallel region. See Section D.1.1.10, omp_set_num_threads.

General-Purpose Lock Routines

omp_destroy_lock Disassociate a lock variable from any locks. See Section D.1.2.1.

omp_init_lock Initialize a lock to be used in subsequent calls. See Section D.1.2.2.

omp_set_lock Make the executing thread wait until the specified lock is available. See Section D.1.2.3.

omp_test_lock Try to set the lock associated with a lock variable. See Section D.1.2.4.

omp_unset_lock Release the executing thread from ownership of a lock. See Section D.1.2.5.

**Table D-1 OpenMP Fortran API Run-Time Library Routines**
Routine Name	Usage
Library Routines That Control and Query the Parallel Execution Environment
`omp_get_dynamic`	Inform if dynamic thread adjustment is enabled. See Section D.1.1.1, omp_get_dynamic.
`omp_get_max_threads`	Get the maximum value that can be returned by calls to the `omp_get_num_threads()` function. See Section D.1.1.2, omp_get_max_threads.
`omp_get_nested`	Inform if nested parallelism is enabled. See Section D.1.1.3, omp_get_nested.
`omp_get_num_procs`	Get the number of processors that are available to the program. See Section D.1.1.4, omp_get_num_procs.
`omp_get_num_threads`	Get the number of threads currently in the team executing the parallel region from which the routine is called. See Section D.1.1.5, omp_get_num_threads.
`omp_get_thread_num`	Get the thread number, within the team, in the range from zero to `omp_get_num_threads()` --1. See Section D.1.1.6, omp_get_thread_num.
`omp_in_parallel`	Inform whether or not a region is executing in parallel. See Section D.1.1.7, omp_in_parallel.
`omp_set_dynamic`	Enable or disable dynamic adjustment of the number of threads available for execution of parallel regions. See Section D.1.1.8, omp_set_dynamic.
`omp_set_nested`	Enable or disable nested parallelism. See Section D.1.1.9, omp_set_nested.
`omp_set_num_threads`	Set the number of threads to use for the next parallel region. See Section D.1.1.10, omp_set_num_threads.
General-Purpose Lock Routines
`omp_destroy_lock`	Disassociate a lock variable from any locks. See Section D.1.2.1.
`omp_init_lock`	Initialize a lock to be used in subsequent calls. See Section D.1.2.2.
`omp_set_lock`	Make the executing thread wait until the specified lock is available. See Section D.1.2.3.
`omp_test_lock`	Try to set the lock associated with a lock variable. See Section D.1.2.4.
`omp_unset_lock`	Release the executing thread from ownership of a lock. See Section D.1.2.5.

D.1.1 Library Routines That Control and Query the Parallel Execution Environment

These routines are described in detail in the following sections.

D.1.1.1 omp_get_dynamic

Determines the status of dynamic thread adjustment.

Syntax:

INTERFACE LOGICAL FUNCTION omp_get_dynamic () END FUNCTION omp_get_dynamic END INTERFACE LOGICAL result result = omp_get_dynamic ()

Return Values:

This function returns TRUE if dynamic thread adjustment is enabled; otherwise it returns FALSE . The function always returns FALSE if dynamic adjustment of the number of threads is not implemented.

See Also:

Section D.1.1.8, omp_set_dynamic

D.1.1.2 omp_get_max_threads

Returns the maximum value that can be returned by calls to the
omp_get_num_threads() function.

Syntax:

INTERFACE INTEGER FUNCTION omp_get_max_threads () END FUNCTION omp_get_max_threads END INTERFACE INTEGER result result = omp_get_max_threads ()

Description:

If your program uses omp_set_num_threads() to change the number of threads, subsequent calls to omp_get_max_threads() will return the new value. When the omp_set_dynamic() routine is set to TRUE , you can use omp_get_max_threads() to allocate data structures that are maximally sized for each thread.

This function has global scope.

Return Values:

This function returns the maximum value whether executing from a serial region or from a parallel region.

If your program used omp_set_num_threads to change the number of threads, subsequent calls to omp_get_max_threads will return the new value.

See Also:

Section D.1.1.10, omp_set_num_threads
Section D.1.1.8, omp_set_dynamic

D.1.1.3 omp_get_nested

Determines the status of nested parallelism.

Syntax:

INTERFACE LOGICAL FUNCTION omp_get_nested () END FUNCTION omp_get_nested END INTERFACE LOGICAL result result = omp_get_nested ()

Description:

This function returns TRUE if nested parallelism is enabled. If nested parallelism is disabled it returns FALSE . The function always returns FALSE if nested parallelism is not implemented.

See Also:

Section D.1.1.9, omp_set_nested

D.1.1.4 omp_get_num_procs

Returns the number of processors that are available to the program.

Syntax:

INTERFACE INTEGER FUNCTION omp_get_num_procs () END FUNCTION omp_get_num_procs END INTERFACE INTEGER result result = omp_get_num_procs ()

Return Values:

This function returns an integer value indicating the number of processors your program has available.

D.1.1.5 omp_get_num_threads

Returns the number of threads currently in the team executing the parallel region from which it is called.

Syntax:

INTERFACE INTEGER FUNCTION omp_get_num_threads () END FUNCTION omp_get_num_threads END INTERFACE INTEGER result result = omp_get_num_threads ()

Description:

This function interacts with the omp_set_num_threads call and the OMP_NUM_THREADS environment variable that control the number of threads in a team. If the number of threads has not been explicitly set by the user, the default is implementation dependent.

The omp_get_num_threads function binds to the closest enclosing PARALLEL directive (see Chapter 6, Parallel Compiler Directives and Their Programming Environment). It returns 1 if the call is made from the serial portion of a program, or from a nested parallel region that is serialized.

See Also:

Section D.1.1.10, omp_set_num_threads
OMP_NUM_THREADS environment variable in Table 6-4, OpenMP Fortran API Environment Variables

D.1.1.6 omp_get_thread_num

Returns the thread number, within the team.

Syntax:

INTERFACE INTEGER FUNCTION omp_get_thread_num () END FUNCTION omp_get_thread_num END INTERFACE INTEGER result result = omp_get_thread_num ()

Description:

This function binds to the closest enclosing PARALLEL directive (see Chapter 6, Parallel Compiler Directives and Their Programming Environment). The master thread of the team is thread zero.

Return Values:

The value returned ranges from zero to omp_get_num_threads() - 1. The function returns zero when called from a serial region or from within a nested parallel region that is serialized.

See Also:

Section D.1.1.5, omp_get_num_threads
Section D.1.1.10, omp_set_num_threads

D.1.1.7 omp_in_parallel

Returns whether or not a region is executing in parallel.

Syntax:

INTERFACE LOGICAL FUNCTION omp_in_parallel () END FUNCTION omp_in_parallel END INTERFACE LOGICAL result result = omp_in_parallel()

Description:

This function has global scope.

Return Values:

This function returns TRUE if it is called from the dynamic extent of a region executing in parallel, even if nested regions exist that may be serialized; otherwise it returns FALSE . A parallel region that is serialized is not considered to be a region executing in parallel.

D.1.1.8 omp_set_dynamic

Enables or disables dynamic adjustment of the number of threads available for execution in a parallel region.

Syntax:

INTERFACE SUBROUTINE omp_set_dynamic (enable) LOGICAL enable END SUBROUTINE omp_set_dynamic END INTERFACE LOGICAL scalar_local_expression CALL omp_set_dynamic (scalar_logical_expression)

Description:

To obtain the best use of system resources, certain run-time environments automatically adjust the number of threads that are used for executing subsequent parallel regions. This adjustment is enabled only if the value of the scalar logical expression to omp_set_dynamic is TRUE . Dynamic adjustment is disabled if the value of the scalar logical expression is FALSE .

When dynamic adjustment is enabled, the number of threads specified by the user becomes the maximum thread count. The number of threads remains fixed throughout each parallel region and is reported by the omp_get_num_threads() function.

A call to omp_set_dynamic overrides the OMP_DYNAMIC environment variable.

The default for dynamic thread adjustment is implementation dependent. A user code that depends on a specific number of threads for correct execution should explicitly disable dynamic threads. Implementations are not required to provide the ability to dynamically adjust the number of threads, but they are required to provide the interface in order to support portability across platforms.

See Also:

Section D.1.1.1, omp_get_dynamic
Section D.1.1.5, omp_get_num_threads
OMP_DYNAMIC environment variable in Table 6-4, OpenMP Fortran API Environment Variables

D.1.1.9 omp_set_nested

Enables or disables nested parallelism.

Syntax:

INTERFACE SUBROUTINE omp_set_nested (enable) LOGICAL enable END SUBROUTINE omp_set_nested END INTERFACE LOGICAL scalar_logical_expression CALL omp_set_nested (scalar_logical_expression) END INTERFACE

Description:

If the value of the scalar logical expression is FALSE , nested parallelism is disabled, and nested parallel regions are serialized and executed by the current thread. This is the default. If the value of the scalar logical expression is set to TRUE , nested parallelism is enabled, and parallel regions that are nested can deploy additional threads to form the team.

A call to omp_set_nested overrides the OMP_NESTED environment variable.

When nested parallelism is enabled, the number of threads used to execute the nested parallel regions is implementation dependent. This allows implementations that comply with the OpenMP standard to serialize nested parallel regions, even when nested parallelism is enabled.

See Also:

Section D.1.1.3, omp_get_nested
OMP_NESTED environment variable in Table 6-4, OpenMP Fortran API Environment Variables

D.1.1.10 omp_set_num_threads

Sets the number of threads to use for the next parallel region.

Syntax:

INTERFACE SUBROUTINE omp_set_num_threads (number_of_threads) INTEGER number_of_threads END SUBROUTINE omp_set_num_threads END INTERFACE INTEGER scalar_integer_expression CALL omp_set_num_threads (scalar_integer_expression)

Description:

The compiler evaluates the scalar integer expression and interprets its value as the number of threads to use. This function takes effect only when called from serial portions of the program. The behavior of the function is undefined if the function is called from a portion of the program where the omp_in_parallel function returns TRUE .

A call to omp_set_num_threads sets the maximum number of threads to use for the next parallel region when dynamic adjustment of the number of threads is enabled. A call to omp_set_num_threads overrides the OMP_NUM_THREADS environment variable.

See Also:

Section D.1.1.5, omp_get_num_threads
Section D.1.1.7, omp_in_parallel
OMP_NUM_THREADS environment variable in Table 6-4, OpenMP Fortran API Environment Variables

D.1.2 General-Purpose Lock Routines

The OpenMP run-time library includes a set of general-purpose locking routines. Your program must not attempt to access any lock variable, var, except through the routines described in this section. The var lock variable is an integer of a KIND large enough to hold an address. On Compaq Tru64 UNIX systems, var should be declared as INTEGER(KIND=8).

The lock control routines must be called in a specific sequence:

The lock to be associated with the lock variable must first be initialized.
The associated lock is made available to the executing thread.
The executing thread is released from lock ownership.
When finished, the lock must always be disassociated from the lock variable.

A simple SET_LOCK and UNSET_LOCK combination satisfies this requirement. If you want your program to do useful work while waiting for the lock to become available, you can use the combination of TRY_LOCK and UNSET_LOCK instead. For example:

PROGRAM LOCK_USAGE implicit none integer(kind=4) ID include 'forompdef' ! It's in /usr/include after installation INTEGER(KIND=8) LCK ! This variable should be of size POINTER CALL OMP_INIT_LOCK(LCK) !$OMP PARALLEL SHARED(LCK) PRIVATE(ID) ID = OMP_GET_THREAD_NUM() CALL OMP_SET_LOCK(LCK) PRINT *, MY THREAD ID IS , ID CALL OMP_UNSET_LOCK(LCK) DO WHILE (.NOT. OMP_TEST_LOCK(LCK)) CALL SKIP(ID) ! Do not yet have lock, do something else END DO CALL WORK(ID) ! Have the lock, now do work CALL OMP_UNSET_LOCK(LCK) !$OMP END PARALLEL CALL OMP_DESTROY_LOCK(LCK) END

Note

Compaq Fortran supports the set of parallel thread routines described in this section for existing programs. For creating new programs, use the set of routines described in Section D.1, OpenMP Fortran API Run-Time Library Routines.

Table D-2, Other Parallel Threads Routines shows additional parallel threads routines. The _Otsxxx (Compaq spelling) and the mpc_xxx (compatibility spelling) routine names are equivalent. For example, calling _OtsGetNumThreads is the same as calling mpc_numthreads .

Table D-2 Other Parallel Threads Routines
Routine Name Description

_otsgetmaxthreads mpc_maxnumthreads Return the number of threads that would normally be used for parallel processing in the current environment. This is affected by the environment variable mp_thread_count , by the number of processes in the current process's processor set, and by any call to _otsinitparallel . Invoke as an integer function. See Section D.2.1.

_OtsGetNumThreads mpc_numthreads Return the number of threads that are being used in the current parallel region (if running within one), or the number of threads that have been created so far (if not currently within a parallel region). Invoke as an integer function. See Section D.2.2.

_OtsGetThreadNum mpc_my_threadnum Return a number that identifies the current thread. The main thread is 0, and slave threads are numbered densely from 1. Invoke as an integer function. See Section D.2.3.

_OtsInitParallel Start slave threads for parallel processing if they have not yet been started implicitly (normally, the threads have been started by default at the first parallel region). Call as a subroutine with two arguments (see Section D.2.4):

The total number of threads desired (or specify zero to allow use of the environment variable MP_THREAD_COUNT or maximum number of processors).
A pointer to a pthreads attribute block, which can be used to control the attributes of the slave threads.

_OtsInParallel mpc_in_parallel_region Return 1 if you are currently within a parallel region, or 0 if not. Invoke as an integer function. See Section D.2.5.

_OtsSetNumThreads Sets the number of threads to use for the next parallel region.

_OtsStopWorkers mpc_destroy Stop any slave threads created by parallel library support. This routine cannot be called from within a parallel region. After this call, new slave threads will be implicitly created the next time a parallel region is encountered, or can be created explicitly by calling _OtsInitParallel . Call as a subroutine. See Section D.2.7.

**Table D-2 Other Parallel Threads Routines**
Routine Name	Description
`_otsgetmaxthreads mpc_maxnumthreads`	Return the number of threads that would normally be used for parallel processing in the current environment. This is affected by the environment variable `mp_thread_count` , by the number of processes in the current process's processor set, and by any call to `_otsinitparallel` . Invoke as an integer function. See Section D.2.1.
`_OtsGetNumThreads mpc_numthreads`	Return the number of threads that are being used in the current parallel region (if running within one), or the number of threads that have been created so far (if not currently within a parallel region). Invoke as an integer function. See Section D.2.2.
`_OtsGetThreadNum mpc_my_threadnum`	Return a number that identifies the current thread. The main thread is 0, and slave threads are numbered densely from 1. Invoke as an integer function. See Section D.2.3.
`_OtsInitParallel`	Start slave threads for parallel processing if they have not yet been started implicitly (normally, the threads have been started by default at the first parallel region). Call as a subroutine with two arguments (see Section D.2.4): The total number of threads desired (or specify zero to allow use of the environment variable `MP_THREAD_COUNT` or maximum number of processors). A pointer to a pthreads attribute block, which can be used to control the attributes of the slave threads.
`_OtsInParallel mpc_in_parallel_region`	Return 1 if you are currently within a parallel region, or 0 if not. Invoke as an integer function. See Section D.2.5.
`_OtsSetNumThreads`	Sets the number of threads to use for the next parallel region.
`_OtsStopWorkers mpc_destroy`	Stop any slave threads created by parallel library support. This routine cannot be called from within a parallel region. After this call, new slave threads will be implicitly created the next time a parallel region is encountered, or can be created explicitly by calling `_OtsInitParallel` . Call as a subroutine. See Section D.2.7.

To call the _Otsxxx or mpc_xxx routines, use the cDEC$ ALIAS directive (described in the Compaq Fortran Language Reference Manual) to handle the mixed-case naming convention and missing trailing underscore.

For example, to call the _OtsGetThreadNum routine with an alias of OtsGetThreadNum , use the following code:

integer a(10) INTERFACE INTEGER FUNCTION OtsGetThreadNum () !DEC$ ALIAS OtsGetThreadNum, '_OtsGetThreadNum' END FUNCTION OtsGetThreadNum END INTERFACE !$par parallel do do i = 1,10 print *, "i=",i, " thread=", OtsGetThreadNum () enddo end

Fortran INTERFACE blocks for all of the _Otsxxx routines are in a file named forompdef.f in /usr/include . Add the following line to your program and you can use the Fortran name otsxxx to call any of the _Otsxxx routines:

USE 'forompdef.f'

Alternatively, to use the compatibility naming convention of mpc_my_threadnum :

integer a(10) INTERFACE INTEGER FUNCTION mpc_my_threadnum () !DEC$ ALIAS mpc_my_threadnum, 'mpc_my_threadnum' END FUNCTION mpc_my_threadnum END INTERFACE !$par parallel do do i = 1,10 print *, "i=",i, " thread=", mpc_my_threadnum () enddo end

These parallel threads are described in detail in the following sections.

See Also:

Section 6.1.3, Parallel Processing Thread Model

D.2.1 _OtsGetMaxThreads or mpc_maxnumthreads

Returns the maximum number of threads for the current environment.

Syntax:

INTERFACE INTEGER FUNCTION otsgetmaxthreads () !DEC$ ALIAS otsgetmaxthreads, '_OtsGetMaxThreads' END FUNCTION otsgetmaxthreads END INTERFACE INTEGER result result = otsgetmaxthreads ()

Description:

Returns the number of threads that would normally be used for parallel processing in the current environment. This is affected by the environment variable MP_THREAD_COUNT , by the number of processes in the current process's processor set, and by any call to _OtsInitParallel .

D.2.2 _OtsGetNumThreads or mpc_numthreads

Returns the number of threads being used (in a parallel region) or created so far (if not in a parallel region).

Syntax:

INTERFACE INTEGER FUNCTION otsgetnumthreads () !DEC$ ALIAS otsgetnumthreads, '_OtsGetNumThreads' END FUNCTION otsgetnumthreads END INTERFACE INTEGER result result = otsgetnumthreads ()

Description:

Returns the number of threads that are being used in the current parallel region (if running within one), or the number of threads that have been created so far (if not currently within a parallel region). You can use this call to decide how to partition a parallel loop. For example:

nt = otsgetnumthreads () c$par parallel do do i = a,nt-1 work(i) = 0 k0 = 1+(i*n)/nt k1 = ((i+1)+n)/nt do j = 1,m do k = k0,k1 ! use work(i) enddo enddo enddo

D.2.3 _OtsGetThreadNum or mpc_my_threadnum

Returns the number of the current thread.

Syntax:

INTERFACE INTEGER FUNCTION otsgetthreadnum () !DEC$ ALIAS otsgetthreadnum, '_OtsGetThreadNum' END FUNCTION otsgetthreadnum END INTERFACE INTEGER result result = otsgetthreadnum ()

Description:

Returns a number that identifies the current thread. The main thread is 0, and slave threads are numbered densely from 1.

D.2.4 _OtsInitParallel

Starts slave threads.

Syntax:

INTERFACE SUBROUTINE otsinitparallel (nthreads, attr) !DEC$ ALIAS otsinitparallel, '_OtsInitParallel' INTEGER nthreads INTEGER (KIND=8) attr !DEC$ ATRRIBUTES, VALUE :: nthreads, attr END SUBROUTINE otsinitparallel END INTERFACE

Description:

Starts slave threads for parallel processing if they have not yet been started implicitly. Use this routine if you want to:

Override number of threads
Override the thread attributes
Control when thread creation occurs (by default, at the first parallel region)

The arguments are:

nthreads is the total number of threads desired, including the master. If nthreads is zero, the number of threads is controlled by the environment variable MP_THREAD_COUNT , if it is defined as a nonzero number, or by the number of processors in the current process's processor set. (See the processor_sets(3) reference page.)
attr is a pointer to a pthreads attribute block, which can be used to control the attributes of the slave threads. If it is zero, all defaults are used except that the slaves' stack size in bytes can be set by the environment variable MP_STACK_SIZE .

D.2.5 _OtsInParallel or mpc_in_parallel_region

Returns the current status of processing activity in a parallel region.

Syntax:

INTERFACE INTEGER FUNCTION otsinparallel () !DEC$ ALIAS otsinparallel, '_OtsInParallel' END FUNCTION OtsInParallel END INTERFACE INTEGER result result = otsinparallel ()

Description:

The routine returns 1 if the program is currently running within a parallel region; otherwise it returns 0.

D.2.6 _OtsSetNumThreads

Sets the number of threads to use for the next parallel region.

D.2.7 _OtsStopWorkers or mpc_destroy

Stops slave threads.

Syntax:

INTERFACE SUBROUTINE otsstopworkers () !DEC$ ALIAS otsstopworkers, '_OtsStopWorkers' END SUBROUTINE otsstopworkers END INTERFACE CALL otsstopworkers ()

Description:

Stop any slave threads created by parallel library support. Use this routine if you need to perform some operation, such as a call to fork() , that cannot tolerate extra threads running in the process. This routine cannot be called from within a parallel region. After this call, new slave threads will be implicitly created the next time a parallel region is encountered, or can be created explicitly by calling _OtsInitParallel .

Index

Contents

Compaq FortranUser Manual for Tru64 UNIX and Linux Alpha Systems

Appendix DParallel Library Routines

D.1.1.2 omp_get_max_threads

D.1.1.3 omp_get_nested

D.1.1.4 omp_get_num_procs

D.1.1.6 omp_get_thread_num

D.1.1.7 omp_in_parallel

D.1.1.9 omp_set_nested

D.1.1.10 omp_set_num_threads

D.1.2 General-Purpose Lock Routines

D.2.1 _OtsGetMaxThreads or mpc_maxnumthreads

D.2.5 _OtsInParallel or mpc_in_parallel_region

Compaq Fortran
User Manual for
Tru64 UNIX and
Linux Alpha Systems

Appendix D
Parallel Library Routines