HP OpenVMS Systems Documentation |
OpenVMS Programming Concepts Manual
6.4.3.6 Interlocked Memory Sequence Checking for the MACRO--32 CompilerThe MACRO--32 Compiler for OpenVMS Alpha Version 4.1 now performs additional code checking and displays warning messages for noncompliant code sequences. The following warning messages can display under the circumstances described: BRNDIRLOC, branch directive ignored in locked memory sequence Explanation: The compiler found a .BRANCH_LIKELY directive within an LDx_L/STx_C sequence. User Action: None. The compiler ignores the .BRANCH_LIKELY directive and, unless other coding guidelines are violated, the code works as written. BRNTRGLOC, branch target within locked memory sequence in routine 'routine_name' Explanation: A branch instruction has a target that is within an LDx_L/STx_C sequence. User Action: To avoid this warning, rewrite the source code to avoid branches within or into LDx_L/STx_C sequences. Branches out of interlocked sequences are valid and are not flagged. MEMACCLOC, memory access within locked memory sequence in routine 'routine_name' Explanation: A memory read or write occurs within an LDx_L/STx_C sequence. This can be either an explicit reference in the source code, such as "MOVL data, R0", or an implicit reference to memory. For example, fetching the address of a data label (for example, "MOVAB label, R0") is accomplished by a read from the linkage section, the data area that is used to resolve external references. User Action: To avoid this warning, move all memory accesses outside the LDx_L/STx_C sequence. RETFOLLOC, RET/RSB follows LDx_L instruction Explanation: The compiler found a RET or RSB instruction after an LDx_L instruction and before finding an STx_C instruction. This indicates an ill-formed lock sequence. User Action: Change the code so that the RET or RSB instruction does not fall between the LDx_L instruction and the STx_C instruction. RTNCALLOC, routine call within locked memory sequence in routine 'routine_name' Explanation: A routine call occurs within an LDx_L/STx_C sequence. This can be either an explicit CALL/JSB in the source code, such as "JSB subroutine", or an implicit call that occurs as a result of another instruction. For example, some instructions such as MOVC and EDIV generate calls to run-time libraries. User Action: To avoid this warning, move the routine call or the instruction that generates it, as indicated by the compiler, outside the LDx_L/STx_C sequence. STCMUSFOL, STx_C instruction must follow LDx_L instruction Explanation: The compiler found an STx_C instruction before finding an LDx_L instruction. This indicates an ill-formed lock sequence. User Action: Change the code so that the STx_C instruction follows the LDx_L instruction. 6.4.3.7 Recompiling Code with ALONONPAGED_INLINE or LAL_REMOVE_FIRST MacrosAny MACRO--32 code on OpenVMS Alpha that invokes either the ALONONPAGED_INLINE or the LAL_REMOVE_FIRST macros from the SYS$LIBRARY:LIB.MLB macro library must be recompiled on OpenVMS Version 7.2 to obtain a correct version of these macros. The change to these macros corrects a potential synchronization problem that is more likely to be encountered on the new Alpha 21264 (EV6) processors.
6.4.4 Interlocked Instructions (VAX Only)On VAX systems, seven instructions interlock memory. A memory interlock enables a VAX CPU or I/O processor to make an atomic read-modify-write operation to a location in memory that is shared by multiple processors. The memory interlock is implemented at the level of the memory controller. On a VAX multiprocessor system, an interlocked instruction is the only way to perform an atomic read-modify-write on a shared piece of data. The seven interlock memory instructions are as follows:
The VAX architecture interlock memory instructions are described in detail in the VAX Architecture Reference Manual. The following description of the interlocked instruction mechanism assumes that the interlock is implemented by the memory controller and that the memory contents are fresh. When a VAX CPU executes an interlocked instruction, it issues an interlock-read command to the memory controller. The memory controller sets an internal flag and responds with the requested data. While the flag is set, the memory controller stalls any subsequent interlock-read commands for the same aligned longword from other CPUs and I/O processors, even though it continues to process ordinary reads and writes. Because interlocked instructions are noninterruptible, they are atomic with respect to threads of execution on the same processor. When the VAX processor that is executing the interlocked instruction issues a write-unlock command, the memory controller writes the modified data back and clears its internal flag. The memory interlock exists for the duration of only one instruction. Execution of an interlocked instruction includes paired interlock-read and write-unlock memory controller commands. When you synchronize data with interlocks, you must make sure that all accessors of that data use them. This means that memory references of an interlocked instruction are atomic only with respect to other interlocked memory references.
On VAX systems, the granularity of the interlock depends on the type of
VAX system. A given VAX implementation is free to implement a larger
interlock granularity than that which is required by the set of
interlocked instructions listed above. On some processors, for example,
while an interlocked access to a location is in progress, interlocked
access to any other location in memory is not allowed.
On Alpha systems, there are no implied memory barriers except those performed by the PALcode routines that emulate the interlocked queue instructions. Wherever necessary, you must insert explicit memory barriers into your code to impose an order on memory references. Memory barriers are required to ensure both the order in which other members of an SMP system or an I/O processor see writes to shared data and the order in which reads of shared data complete. There are two types of memory barrier:
The MB instruction guarantees that all subsequent loads and stores do not access memory until after all previous loads and stores have accessed memory from the viewpoint of multiple threads of execution. Even in a multiprocessor system, all of the instruction's reads of one processor always return the data from the most recent writes by that processor, assuming no other processor has written to the location. Alpha compilers provide semantics for generating memory barriers when needed for specific operations on data items. The instruction memory barrier (IMB) PALcode routine must be used after a modification to the instruction stream to flush prefetched instructions. In addition, it also provides the same ordering effects as the MB instruction. Code that modifies the instruction stream must be changed to synchronize the old and new instruction streams properly. Use of an REI instruction to accomplish this does not work on OpenVMS Alpha systems. If a kernel mode code sequence changes the expected instruction stream, it must issue an IMB instruction after changing the instruction stream and before the time the change is executed. For example, if a device driver stores an instruction sequence in an extension to the unit control block (UCB) and then transfers control there, it must issue an IMB instruction after storing the data in the UCB but before transferring control to the UCB data.
The MACRO-32 compiler for OpenVMS Alpha provides the EVAX_IMB built-in
to insert explicitly an IMB instruction in the instruction stream.
Privileged architecture library (PALcode) routines include Alpha
instructions that emulate VAX queue and interlocked queue instructions.
PALcode executes in a special environment with interrupts blocked. This
feature results in noninterruptible updates. A PALcode routine can
perform the multiple memory reads and memory writes that insert or
remove a queue element without interruption.
The operating system uses the synchronization primitives provided by
the hardware as the basis for several different synchronization
techniques. The following sections summarize the operating system's
synchronization techniques available to application software.
On Alpha systems without kernel threads, only one thread of execution can execute within a process at a time, so synchronizaton of threads that execute simultaneously is not a concern. However, a delivery of an AST or the occurrence of an exception can intervene in a sequence of instructions in one thread of execution. Because these conditions can occur, application design must take into account the need for synchronization with condition handlers and AST procedures. On Alpha systems, writing bytes or words or performing a read-modify-write operation requires a sequence of Alpha instructions. If the sequence incurs an exception or is interrupted by AST delivery or an exception, another process code thread can run. If that thread accesses the same data, it can read incompletely written data or cause data corruption. Aligning data on natural boundaries and unpacking word and byte data reduce this risk. On Alpha systems, an application written in a language other than VAX MACRO must identify to the compiler data accessed by any combination of mainline code, AST procedures, and condition handlers to ensure that the compiler generates code that is atomic with respect to other threads. Also, data shared with other processes must be identified. With process-private data accessed from both AST and non-AST threads of execution, the non-AST thread can block AST delivery by using the Set AST Enable (SYS$SETAST) system service. If the code is running in kernel mode, it can also raise IPL to block AST delivery. The Guide to Creating OpenVMS Modular Procedures describes the concept of AST reentrancy.
On a uniprocessor or in a symmetric multiprocessing (SMP) system,
access to multiple locations with a read or write instructions or with
a read-modify-write sequence is not atomic on VAX and Alpha systems.
Additional synchronization methods are required to control access to
the data. See Section 6.5.4 and the sections following it, which
describe the use of higher-level synchronization techniques.
On Alpha systems with kernel threads, the system allows multiple execution contexts, or threads within a process, that all share the same address space to run simultaneously. The synchronization provided by the SCHED spinlock continues to allow thread safe access to process data structures such as the process control block (PCB). However, access to process address space and any structures currently not explicitly synchronized with spin locks are no longer guaranteed exclusive access solely by access mode. In the multithreaded environment, a new process level synchronization mechanism is required. Because spin locks operate on a systemwide level and do not offer the process level granularity required for inner mode access synchronization in a multithreaded environment, a process level semaphore is necessary to serialize inner mode (kernel and executive) access. User and supervisor mode threads are allowed to run without any required synchronization.
The process level semaphore for inner mode synchronization is the inner
mode (IM) semaphore. The IM semaphore is created in the first
floating-point registers and execution data block (FRED) page in the
balance set slot process for each process. In a multithreaded
environment, a thread requiring inner mode access must acquire
ownership of the IM semaphore. That is, two threads associated with the
same process cannot execute in inner mode simultaneously. If the
semaphore is owned by another thread, then the requesting thread spins
until inner mode access becomes available, or until some specified
timeout value has expired.
In some applications (usually real-time applications), a number of processes perform a series of tasks. In such applications, the sequence in which a process executes can be controlled or synchronized by means of process priority. The basic method of synchronization by priority involves executing the process with the highest priority while preventing all other processes from executing. If you use process priority for synchronization, be aware that if the higher-priority process is blocked, either explicitly or implicitly (for example, when doing an I/O), the lower-priority process can run, resulting in corruption on the data of the higher process's activities. Because each processor in a multiprocessor system, when idle, schedules its own work load, it is impossible to prevent all other processes in the system from executing. Moreover, because the scheduler guarantees only that the highest-priority and computable process is scheduled at any given time, it is impossible to prevent another process in an application from executing.
Thus, application programs that synchronize by process priority must be
modified to use a different serialization method to run correctly in a
multiprocessor system.
The operating system provides the following techniques to synchronize multiprocess applications:
The operating system provides basic event synchronization through event flags. Common event flags can be shared among cooperating processes running on a uniprocessor or in an SMP system, though the processes must be in the same user identification code (UIC) group. Thus, if you have developed an application that requires the concurrent execution of several processes, you can use event flags to establish communication among them and to synchronize their activity. A shared, or common, event flag can represent any event that is detectable and agreed upon by the cooperating processes. See Section 6.6 for information about using event flags. The lock management system services---Enqueue Lock Request (SYS$ENQ), and Dequeue Lock Request (SYS$DEQ)---provide multiprocess synchronization tools that can be requested from all access modes. For details about using lock management system services, see Chapter 7.
Synchronization of access to shared data by a multiprocess application
should be designed to support processes that execute concurrently on
different members of an SMP system. Applications that share a global
section can use VAX MACRO interlocked instructions or the equivalent in
other languages to synchronize access to data in the global section.
These applications can also use the lock management system services for
synchronization.
Most application programs that run on an operating system in a uniprocessor system also run without modification in a multiprocessor system. However, applications that access writable global sections or that rely on process priority for synchronizing tasks should be reexamined and modified according to the information contained in this section.
In addition, some applications may execute more efficiently on a
multiprocessor if they are specifically adapted to a multiprocessing
environment. Application programmers may want to decompose an
application into several processes and coordinate their activities by
means of event flags or a shared region in memory.
A spin lock is a device used by a processor to synchronize access to data that is shared by members of a symmetric multiprocessing (SMP) system. A spin lock enables a set of processors to serialize their access to shared data. The basic form of a spin lock is a bit that indicates the state of a particular set of shared data. When the bit is set, it shows that a processor is accessing the data. A bit is either tested and set or tested and cleared; it is atomic with respect to other threads of execution on the same or other processors. A processor that needs access to some shared data tests and sets the spin lock associated with that data. To test and set the spin lock, the processor uses an interlocked bit-test-and-set instruction. If the bit is clear, the processor can have access to the data. This is called locking or acquiring the spin lock. If the bit is set, the processor must wait because another processor is already accessing the data. Essentially, a waiting processor spins in a tight loop; it executes repeated bit test instructions to test the state of the spin lock. The term spin lock derives from this spinning. When the spin lock is in a loop, repeatedly testing the state of the spin lock, the spin lock is said to be in a state of busy wait. The busy wait ends when the processor accessing the data clears the bit with an interlocked operation to indicate that it is done. When the bit is cleared, the spin lock is said to be unlocked or released.
Spin locks are used by the operating system executive, along with the
interrupt priority level (IPL), to control access to system data
structures in a multiprocessor system.
A writable global section is an area of memory that can be accessed (read and modified) by more than one process. On uniprocessor or SMP systems, access to a single global section with an appropriate read or write instruction is atomic on VAX and Alpha systems. Therefore, no other synchronization is required. An appropriate read or write on VAX systems is an instruction that is a naturally aligned byte, word, or longword, such as a MOVx instruction, where x is a B for a byte, W for a word, or L for a longword. On Alpha systems, an appropriate read or write instruction is a naturally aligned longword or quadword, for instance, an LDx or write STx instruction where x is an L for an aligned longword or Q for an aligned quadword. On both VAX and Alpha multiprocessor systems, for a read-modify-write sequence on a multiprocessor system, two or more processes can execute concurrently, one on each processor. As a result, it is possible that concurrently executing processes can access the same locations simultaneously in a writable global section. If this happens, only partial updates may occur, or data could be corrupted or lost, because the operation is not atomic. Unless proper interlocked instructions are used on VAX systems or load-locked/store-conditional instructions are used on Alpha systems, invalid data may result. You must use interlocked or load-locked/store-conditional instructions or other synchronizing techniques, such as locks or event flags. On a uniprocessor or SMP system, access to multiple locations within a global section with read or write instructions or a read-modify-write sequence is not atomic on VAX and Alpha systems. On a uniprocessor system, an interrupt can occur that causes process preemption, allowing another process to run and access the data before the first process completes its work. On a multiprocessor system, two processes can access the global section simultaneously on different processors. You must use a synchronization technique such as a spin lock or event flags to avoid these problems. Check existing programs that use writable global sections to ensure that proper synchronization techniques are in place. Review the program code itself; do not rely on testing alone, because an instance of simultaneous access by more than one process to a location in a writable global section is rare.
If an application must use queue instructions to control access to
writable global sections, ensure that it uses interlocked queue
instructions.
Event flags are maintained by the operating system for general programming use in coordinating thread execution with asynchronous events. Programs can use event flags to perform a variety of signaling functions. Event flag services clear, set, and read event flags. They also place a thread in a wait state pending the setting of an event flag or flags. Table 6-2 shows the two usage styles of event flags.
The wait form of system services is a variant of asynchronous services;
there is a service request and then a wait for the completion of the
request. For reliable operation in most applications, WAIT form
services must specify an I/O status block (IOSB). The IOSB prevents the
service from completing prematurely and also provides status
information.
Explicit use of event flags follows these general steps:
Implicit use of event flags may involve only step 4, or steps 1, 4, and 5. Use run-time library routines and system services to accomplish these event flag tasks. Table 6-3 summarizes the event flag routines and services.
Some system services set an event flag to indicate the completion or the occurrence of an event; the calling program can test the flag. Other system services use event flags to signal events to the calling process, such as SYS$ENQ(W), SYS$QIO(W), or SYS$SETIMR.
|