![]() |
![]() HP OpenVMS Systems Documentation |
![]() |
HP OpenVMS Programming Concepts Manual
6.6.3.3 Characteristics of Noncompliant CodeThe areas of noncompliance detected by the SRM_CHECK tool can be grouped into the following four categories. Most of these can be fixed by recompiling with new compilers. In rare cases, the source code may need to be modified. See Section 6.6.3.5 for information about compiler versions.
If the SRM_CHECK tool finds a violation in an image, the image should
be modified if necessary and recompiled with the appropriate compiler
(see Section 6.6.3.5). After recompiling, the image should be analyzed
again. If violations remain after recompiling, the source code must be
examined to determine why the code scheduling violation exists.
Modifications should then be made to the source code.
The Alpha Architecture Reference Manual describes how an atomic update of data between processors must be formed. The Third Edition, in particular, has much more information on this topic. Exceptions to the following two requirements are the source of all known noncompliant code:
Therefore, the SRM_CHECK tool looks for the following:
To illustrate, the following are examples of code flagged by SRM_CHECK.
In the above example, an LDQ instruction was found after an LDQ_L before the matching STQ_C. The LDQ must be moved out of the sequence, either by recompiling or by source code changes. (See Section 6.6.3.3.)
In the above example, a branch was discovered between the LDL_L and STL_C. In this case, there is no "fall through" path between the LDx_L and STx_C, which the architecture requires.
The following MACRO--32 source code demonstrates code where there is a "fall through" path, but this case is still noncompliant because of the potential branch and a memory reference in the lock sequence.
To correct this code, the memory access to read the value of INDEX must first be moved outside the LDQ_L/STQ_C sequence. Next, the branch between the LDQ_L and STQ_C, to the label IS_CLEAR, must be eliminated. In this case, it could be done using a CMOVEQ instruction. The CMOVxx instructions are frequently useful for eliminating branches around simple value moves. The following example shows the corrected code:
6.6.3.5 Compiler VersionsThis section contains information about versions of compilers that may generate noncompliant code sequences and the minimum recommended versions to use when recompiling. Table 6-1 contains information for OpenVMS compilers.
Current versions of the MACRO--64 assembler may still encounter the
loop rotation issue. However, MACRO--64 does not perform code
optimization by default, and this problem occurs only when optimization
is enabled. If SRM_CHECK indicates a noncompliant sequence in the
MACRO--64 code, it should first be recompiled without optimization. If
the sequence is still flagged when retested, the source code itself
contains a noncompliant sequence that must be corrected.
The MACRO--32 Compiler for OpenVMS Alpha Version 4.1 and later performs additional code checking and displays warning messages for noncompliant code sequences. The following warning messages can display under the circumstances described: BRNDIRLOC, branch directive ignored in locked memory sequence Explanation: The compiler found a .BRANCH_LIKELY directive within an LDx_L/STx_C sequence. User Action: None. The compiler ignores the .BRANCH_LIKELY directive and, unless other coding guidelines are violated, the code works as written. BRNTRGLOC, branch target within locked memory sequence in routine 'routine_name' Explanation: A branch instruction has a target that is within an LDx_L/STx_C sequence. User Action: To avoid this warning, rewrite the source code to avoid branches within or into LDx_L/STx_C sequences. Branches out of interlocked sequences are valid and are not flagged. MEMACCLOC, memory access within locked memory sequence in routine 'routine_name' Explanation: A memory read or write occurs within an LDx_L/STx_C sequence. This can be either an explicit reference in the source code, such as "MOVL data, R0", or an implicit reference to memory. For example, fetching the address of a data label (for example, "MOVAB label, R0") is accomplished by a read from the linkage section, the data area that is used to resolve external references. User Action: To avoid this warning, move all memory accesses outside the LDx_L/STx_C sequence. RETFOLLOC, RET/RSB follows LDx_L instruction Explanation: The compiler found a RET or RSB instruction after an LDx_L instruction and before finding an STx_C instruction. This indicates an ill-formed lock sequence. User Action: Change the code so that the RET or RSB instruction does not fall between the LDx_L instruction and the STx_C instruction. RTNCALLOC, routine call within locked memory sequence in routine 'routine_name' Explanation: A routine call occurs within an LDx_L/STx_C sequence. This can be either an explicit CALL/JSB in the source code, such as "JSB subroutine", or an implicit call that occurs as a result of another instruction. For example, some instructions such as MOVC and EDIV generate calls to run-time libraries. User Action: To avoid this warning, move the routine call or the instruction that generates it, as indicated by the compiler, outside the LDx_L/STx_C sequence. STCMUSFOL, STx_C instruction must follow LDx_L instruction Explanation: The compiler found an STx_C instruction before finding an LDx_L instruction. This indicates an ill-formed lock sequence. User Action: Change the code so that the STx_C instruction follows the LDx_L instruction. 6.6.3.7 Recompiling Code with ALONONPAGED_INLINE or LAL_REMOVE_FIRST MacrosAny MACRO--32 code on OpenVMS Alpha that invokes either the ALONONPAGED_INLINE or the LAL_REMOVE_FIRST macros from the SYS$LIBRARY:LIB.MLB macro library must be recompiled on OpenVMS Version 7.2 and later to obtain a correct version of these macros. The change to these macros corrects a potential synchronization problem that is more likely to be encountered on the Alpha 21264 (EV6) and subsequent processors.
6.6.4 Interlocked Instructions (VAX Only)On VAX systems, seven instructions interlock memory. A memory interlock enables a VAX CPU or I/O processor to make an atomic read-modify-write operation to a location in memory that is shared by multiple processors. The memory interlock is implemented at the level of the memory controller. On a VAX multiprocessor system, an interlocked instruction is the only way to perform an atomic read-modify-write on a shared piece of data. The seven interlock memory instructions are as follows:
The VAX architecture interlock memory instructions are described in detail in the VAX Architecture Reference Manual. The following description of the interlocked instruction mechanism assumes that the interlock is implemented by the memory controller and that the memory contents are fresh. When a VAX CPU executes an interlocked instruction, it issues an interlock-read command to the memory controller. The memory controller sets an internal flag and responds with the requested data. While the flag is set, the memory controller stalls any subsequent interlock-read commands for the same aligned longword from other CPUs and I/O processors, even though it continues to process ordinary reads and writes. Because interlocked instructions are noninterruptible, they are atomic with respect to threads of execution on the same processor. When the VAX processor that is executing the interlocked instruction issues a write-unlock command, the memory controller writes the modified data back and clears its internal flag. The memory interlock exists for the duration of only one instruction. Execution of an interlocked instruction includes paired interlock-read and write-unlock memory controller commands. When you synchronize data with interlocks, you must make sure that all accessors of that data use them. This means that memory references of an interlocked instruction are atomic only with respect to other interlocked memory references.
On VAX systems, the granularity of the interlock depends on the type of
VAX system. A given VAX implementation is free to implement a larger
interlock granularity than that which is required by the set of
interlocked instructions listed above. On some processors, for example,
while an interlocked access to a location is in progress, interlocked
access to any other location in memory is not allowed.
On Alpha systems, there are no implied memory barriers except those performed by the PALcode routines that emulate the interlocked queue instructions. Wherever necessary, you must insert explicit memory barriers into your code to impose an order on references to data shared with threads of execution that could be running on other members of an SMP system. Memory barriers are required to ensure both the order in which other members of an SMP system or an I/O processor see writes to shared data, and the order in which reads of shared data complete. There are two types of memory barrier:
The MB instruction guarantees that all subsequent loads and stores do not access memory until after all previous loads and stores have accessed memory from the viewpoint of multiple threads of execution. Alpha compilers provide semantics for generating memory barriers when needed for specific operations on data items. Code that modifies the instruction stream must be changed to synchronize the old and new instruction streams properly. Use of an REI instruction to accomplish this does not work on OpenVMS Alpha systems. The instruction memory barrier (IMB) PALcode routine must be used after a modification to the instruction stream to flush prefetched instructions. In addition, it also provides the same ordering effects as the MB instruction. If a kernel mode code sequence changes the expected instruction stream, it must issue an IMB instruction after changing the instruction stream and before the time the change is executed. For example, if a device driver stores an instruction sequence in an extension to the unit control block (UCB) and then transfers control there, it must issue an IMB instruction after storing the data in the UCB but before transferring control to the UCB data.
The MACRO-32 compiler for OpenVMS Alpha provides the EVAX_IMB built-in
to insert explicitly an IMB instruction in the instruction stream.
The I64 memory fence (mf) instruction causes all memory operations
before the mf instruction to complete before any memory operations
after the mf instruction are allowed to begin. Fence instructions
combine the release and acquire semantics into a bidirectional fence;
that is, they guarantee that all previous orderable instructions are
made visible prior to any subsequent orderable instruction being made
visible.
Privileged architecture library (PALcode) routines include Alpha
instructions that emulate VAX queue and interlocked queue instructions.
PALcode executes in a special environment with interrupts blocked. This
feature results in noninterruptible updates. A PALcode routine can
perform the multiple memory reads and memory writes that insert or
remove a queue element without interruption.
The VAX interlocked queue instructions work unchanged on OpenVMS I64 systems and result in the SYS$PAL_xxxx run-time routine PALcode equivalents being called, which incorporate the necessary interlocks and memory barriers. Whenever possible, the OpenVMS I64 BLISS, C, and MACRO compilers convert CALL_PAL macros to the equivalent OpenVMS-provided SYS$PAL_xxxx operating system calls for backward compatibility. The BLISS compiler compiles each of the queue manipulation PALcode builtins into SYS$PAL_xxxx system service requests.
Refer to Porting Applications from HP OpenVMS Alpha to HP OpenVMS Industry Standard 64 for Integrity Servers for complete information on the BLISS
implementation.
The operating system uses the synchronization primitives provided by
the hardware as the basis for several different synchronization
techniques. The following sections summarize the operating system's
synchronization techniques available to application software.
On Alpha and I64 systems without kernel threads, only one thread of execution can execute within a process at a time, so synchronization of threads that execute simultaneously is not a concern. However, a delivery of an AST or the occurrence of an exception can intervene in a sequence of instructions in one thread of execution. Because these conditions can occur, application design must take into account the need for synchronization with condition handlers and AST procedures. On Alpha systems without the byte-word extension, writing bytes or words or performing a read-modify-write operation requires a sequence of Alpha instructions. If the sequence incurs an exception or is interrupted by AST delivery or an exception, another process code thread can run. If that thread accesses the same data, it can read incompletely written data or cause data corruption. Aligning data on natural boundaries and unpacking word and byte data reduce this risk. On Alpha and I64 systems, an application written in a language other than MACRO-32 must identify to the compiler data accessed by any combination of mainline code, AST procedures, and condition handlers to ensure that the compiler generates code that is atomic with respect to other threads. Also, data shared with other processes must be identified. With process-private data accessed from both AST and non-AST threads of execution, the non-AST thread can block AST delivery by using the Set AST Enable (SYS$SETAST) system service. If the code is running in kernel mode, it can also raise IPL to block AST delivery. The Guide to Creating OpenVMS Modular Procedures describes the concept of AST reentrancy.
On a uniprocessor or in a symmetric multiprocessing (SMP) system,
access to multiple locations with a read or write instruction or with a
read-modify-write sequence is not atomic on OpenVMS systems. Additional
synchronization methods are required to control access to the data. See
Section 6.7.4 and the sections following it, which describe the use of
higher-level synchronization techniques.
On Alpha and I64 systems with kernel threads, the system allows multiple execution contexts, or threads within a process, that all share the same address space to run simultaneously. The synchronization provided by spinlocks continues to allow thread safe access to process data structures such as the process control block (PCB). However, access to process address space and any structures currently not explicitly synchronized with spin locks are no longer guaranteed exclusive access solely by access mode. In the multithreaded environment, a new process level synchronization mechanism is required. Because spin locks operate on a systemwide level and do not offer the process level granularity required for inner-mode access synchronization in a multithreaded environment, a process level semaphore is necessary to serialize inner mode (kernel and executive) access. User and supervisor mode threads are allowed to run without any required synchronization. The process level semaphore for inner-mode synchronization is the inner mode (IM) semaphore. The IM semaphore is created in the first floating-point registers and execution data block (FRED) page in the balance set slot process for each process. In a multithreaded environment, a thread requiring inner mode access acquires ownership of the IM semaphore. That is, in general, two threads associated with the same process cannot execute in inner mode simultaneously. If the semaphore is owned by another thread, then the requesting thread spins until inner mode access becomes available, or until some specified timeout value has expired.
|