HP OpenVMS Systemsask the wizard |
The Question is: Dear Wizard In article wiz_2681 you wrote: "The OpenVMS Wizard will leave the question of how to reliably deal with bitlocks, and how to spin appropriately, for another topic!" We are still waiting :) Thank you. The Answer is : Synchronization techniques are described in detail in the OpenVMS Programming Concepts manual, and in the hardware platform or hardware architecture documentation. Bitlocks and interlocked instructions are the most lightweight of all synchronization techniques available on OpenVMS systems, and the interlocked capabilities are used as the basis of many other more complex (and more flexible) synchronization mechanisms. Interlocked instructions permit only one accessor to perform the specified function at a time, and are thus useful for protecting critical code and critical data from uncoordinated (shared) access. VAX systems have hardware-implemented bitlocks and hardware bitlock (and interlocked quuee) instructions including BBCCI, BBSSI, REMQHI, REMQTI, INSQHI, and INSQTI. For compatibility with existing applications, the OpenVMS Alpha and OpenVMS Itanium systems have simulations of the VAX interlocked instructions. Further, OpenVMS provides run-time library calls (eg: lib$bbcci), and various compilers can offer "built-ins" -- language extensions -- targeting application sychronization. (These calls could potentially be implemented on OpenVMS Alpha, for instance, using the Alpha LDx_L and STx_C mechanisms -- please see the OpenVMS source listings media for details.) Use only interlocked instructions when modifying the bitlock. Do not mix interlocked and non-interlocked access to the interlock. (The interlocks correctly manage the processor caches. The non-interlocked access can run afoul of the processor cache and may not see the correct data. For details on memory barriers and shared memory, see the OpenVMS Programming Concepts Manual and please see topic (2681).) The interlocked instructions can typically lock ranges of memory rather larger than the target bit (or target queue entry), and can thus bitlocks located within the same "interlock grain" can encounter contention. This contention will not disrupt the correct operation of the bitlocks, but it can slow the access to the bitlocks. (The particular span of interlocking is implementation-specific, and can range from a naturally aligned longword (VAX) to a naturally-aligned quadword (Alpha) to all of system memory, and can potentially be discontiguous. For additional related information, please see topics (8149) and (7383).) Except for high-IPL kernel-mode (driver) code that must necessarily block other system activity, application code should not spin on a bitlock -- spinning is a term for repeatedly checking the state of the bitlock. Spinning causes system performance overhead because of the loop and because of the interlocks used to access the bitlock, and the act of spinning can reduce the ability of other accessors to access the bitlock. Spinning is also sensitive to non-uniform memory access (NUMA) memory organization, with processes local to the bitlock receiving substantially more preferential access to the bitlock. (The kernel-mode OpenVMS spinlock primitives were explicitly modified to account for this particular characteristic of NUMA.) When spinning is required, spinning is normally best performed with a combination of interlocked and non-interlocked operations. The interlocked operations are used to acquire and to release the bitlock, while the non-interlocked (read) operations are used to poll for the potential to access the requested bitlock -- this design avoids the contention on the interlock primitives that can arise if the application spins using the interlocked operations. (As was mentioned earlier, the granularity of the interlock primitives can entail locking an entirely implementation-specific and potentially non-contiguous chunk of memory involving from between a longword and all of physical memory, inclusive.) The usual approach for an application waiting on a bitlock is to use a $resched, $hiber/$schdwk or other similar system service call to "back off" from the bitlock; to avoid repeated sequential access to the bitlock. Backing off from the bitlock permits other accessors to access the bitlock, and reduces the general system overhead resulting from the spinning. (Application designs involving use of the distributed lock manager can also assist here.) The usual approach for applications communicating via shared memory involves two or more queues of data structures -- a queue of structures that are free (often fixed-length), and a queue of structures that are pending work processing. A process writing data dequeues a free packet, fills it in, and then appends the packet to the pending work queue. This approach avoids contention and the potential for corruptions when multiple accessors are referencing the shared memory data structures in parallel. Also please see the OpenVMS documentation of the distributed lock manager. This documentation is included in the OpenVMS Programming Concepts Manual and in the system service reference materials for $enq[w] and $deq[w]. There are many features available to clients of the lock manager -- distributed operations, asynchronous grant and asynchronous blocking notifications, shared and exclusive access, queued access -- that must otherwise be manually implemented on top of bitlocks or other more primitive synchronization techniques. Also please see topics (1661), (2681), (6099), (7383) and (8149). Attached is an example of interlocked queue operations. #pragma module qdemo "V2.0" #pragma builtins /* ** Copyright 2001 Compaq Computer Corporation ** */ /* **++ ** FACILITY: Examples ** ** MODULE DESCRIPTION: ** ** This routine contains a demonstration of the OpenVMS self-relative ** interlocked RTL queue routines lib$remqhi() and lib$insqti(), and ** the equivilent Compaq C compiler builtin functions, and provides ** a demonstration of the OpenVMS Compaq C memory management routines. ** ** AUTHORS: ** ** Stephen Hoffman ** ** CREATION DATE: 21-Jan-1990 ** ** DESIGN ISSUES: ** ** NA ** ** MODIFICATION HISTORY: ** ** 9-Aug-2001 Hoffman ** Compaq C updates, added builtin calls. ** **-- */ /* ** $! queue demo build procedure... ** $ cc/decc/debug/noopt qdemo ** $ link qdemo/debug ** $! */ /* ** ** INCLUDE FILES ** */ #include <builtins.h> #include <lib$routines.h> #include <libdef.h> #include <ssdef.h> #include <stdio.h> #include <stdlib.h> #include <stsdef.h> main() { unsigned long int retstat; unsigned long int i; struct queueblock { unsigned long int *flink; unsigned long int *blink; unsigned long int dd; } *qb; /* ** Allocate the (zeroed) queue header now. ** ** The interlocked queue forward and backward links located in ** the queue header (of self-relative queues) must be initialized ** to zero prior to usage. calloc() performs this for us. Blocks ** allocated and inserted in the queue subsequently need not have ** their links zeroed. ** ** NB: On VMS, the calloc() and malloc() routines acquire memory ** that is quadword (or better) aligned. The VAX hardware queue ** instructions (and thus the queue routines) require a minimum ** of quadword alignment. */ struct queueblock *header = calloc(1, sizeof( struct queueblock )); struct queueblock *qtmp = 0; printf( "qdemo.c -- queue demomstration\n" ); printf( "\nRTL calls...\n\n" ); /* ** dynamically allocate the memory for each block, place a value ** in the block and insert the block onto the tail of the queue. */ for ( i = 0; i < 10; i++ ) { qtmp = calloc(1,sizeof( struct queueblock )); qtmp->dd = i; printf( "inserting item: %d\n", qtmp->dd ); retstat = lib$insqti( qtmp, header ); }; /* ** Remove queue entries until there are no more. */ retstat = SS$_NORMAL; while ( $VMS_STATUS_SUCCESS( retstat ) ) { retstat = lib$remqhi( header, &qtmp ); if ( $VMS_STATUS_SUCCESS( retstat ) ) { printf( "removing item: %d\n", qtmp->dd ); free( qtmp ); } } if ( retstat != LIB$_QUEWASEMP ) printf( "unexpected status %x received\n", retstat ); else printf( "expected completion status received\n" ); printf( "\nbuiltin calls...\n\n" ); /* ** dynamically allocate the memory for each block, place a value ** in the block and insert the block onto the tail of the queue. */ for ( i = 0; i < 10; i++ ) { qtmp = calloc(1,sizeof( struct queueblock )); qtmp->dd = i; printf( "inserting item: %d\n", qtmp->dd ); retstat = _INSQTI( qtmp, header ); }; /* ** Remove queue entries until there are no more. */ retstat = _remqi_removed_more; while (( retstat == _remqi_removed_more ) || ( retstat == _remqi_removed_empty )) { retstat = _REMQHI( header, &qtmp ); if (( retstat == _remqi_removed_more ) || ( retstat == _remqi_removed_empty )) { printf( "removing item: %d\n", qtmp->dd ); free( qtmp ); } } switch ( retstat ) { case _remqi_removed_empty: printf( "unexpected status _remqi_removed_empty received\n" ); break; case _remqi_removed_more: printf( "unexpected status _remqi_removed_more received\n" ); break; case _remqi_not_removed: printf( "unexpected status _remqi_not_removed received\n" ); break; case _remqi_empty: printf( "expected status _remqi_empty received\n" ); break; } printf( "\nDone...\n" ); return SS$_NORMAL; }
|