HP OpenVMS Systems Documentation

HP OpenVMS Calling Standard

4.5.2 Stack Overflow Detection

This section defines the conventions to support the execution of multiple threads in a multilanguage OpenVMS environment. Specifically defined is how compiled code must perform stack limit checking. While this standard is compatible with a multithreaded execution environment, the detailed mechanisms, data structures, and procedures that support this capability are not specified in this manual.

For a multithreaded environment, the following characteristics are assumed:

There can be one or more threads executing within a single process.
The state of a thread is represented in a thread environment block (TEB).
The TEB of a thread contains information that determines a stack limit below which the stack pointer must not be decremented by the executing code (except for code that implements the multithreaded mechanism itself).
Exception handling is fully reentrant and multithreaded.

4.5.2.1 Stack Limit Checking

A program that is otherwise correct can fail because of stack overflow. Stack overflow occurs when extension of the stack (by decrementing the stack pointer, SP) allocates addresses not currently reserved for the current thread's stack. This section defines the conventions for stack limit checking in a multithreaded environment.

In the following sections, the term new stack region refers to the region of the stack from one less than the old value of SP to the new value of SP.

Stack Guard Region

In a multithreaded environment, the address space beyond each thread's stack is protected by contiguous guard pages, which trap on any access. These pages form the stack guard region.

Stack Reserve Region

In some cases, it is useful to maintain a stack reserve region, which is a minimum-sized region that is between the current top of stack and the stack guard region. A stack reserve region can ensure that the following conditions exist:

Exceptions or asynchronous system faults (ASTs, analogous to asynchronous signals) have stack space to execute on a thread's stack.
The exception dispatcher and any exception handler that it might call have stack space to execute after detection of an invalid attempt to extend the stack.

This calling standard does not require a stack reserve region, but it does allow a language (for example, Ada) and its run-time system to implement one.

4.5.2.2 Methods for Stack Limit Checking

Because accessible memory may be available at addresses lower than those occupied by the stack guard region, compilers must generate code that never extends the stack past the stack guard region into accessible memory that is not allocated to the thread's stack.

A general strategy to prevent extending the stack past the stack guard region is to access each page of memory down to and possibly including the page corresponding to the intended new value of the SP. If the stack is to be extended by an amount larger than the size of a memory page, then a series of accesses is required that works from higher to lower addressed pages. If any access results in a memory access violation, then the code has made an invalid attempt to extend the stack of the current thread.

This calling standard defines two methods for stack limit checking, implicit and explicit, which are explained in the following sections.

Implicit Stack Limit Checking

If a byte (not necessarily the lowest) of the new stack region is guaranteed to be accessed prior to any further stack extension, then the stack can be extended by an increment that is up to one-half the stack guard region (without any additional accesses).

This standard requires that the minimum stack guard region size is 8192 bytes.

If the stack is being extended by 4096 bytes or less and the application does not use a stack reserve region, then explicit checking is not required. However, because asynchronous interrupts and calls to other procedures may also cause stack extension without explicit checking, stack extension with implicit checking must adhere to the following rules:

Explicit stack limit checking must be performed unless the amount by which the SP is decremented is known to be less than or equal to 4096 and the application does not use a stack reserve region.
Some byte in the new stack region must be accessed before the SP can be further decremented for a subsequent stack extension.
This access can be performed either before or after the SP is decremented for this stack extension, but it must be done before the SP can be decremented again.
No standard procedure call can be made before some byte in the new stack region is accessed.
The system exception dispatcher ensures that the lowest addressed byte in the new stack region is accessed if any kind of asynchronous interrupt occurs both after the SP is decremented and before the access in the new stack region occurs.

These conventions ensure that the stack pointer is not decremented so that it points to accessible storage beyond the stack limit without this error being detected (either by the guard region being accessed by the thread or by an explicit stack limit check failure).

As a matter of practice, the system can provide multiple guard pages in the stack guard region. When a stack overflow is detected as a result of access to the stack guard region, one or more guard pages can be unprotected for use by the exception-handling facility, as long as one or more guard pages remain protected to provide implicit stack limit checking during exception processing.

Explicit Stack Limit Checking

If the stack is being extended by an unknown amount or by a known amount that is greater than the maximum implicit check size 4096, then a code sequence that follows the rules for implicit stack limit checking can be executed in a loop to access the new stack region incrementally in segments that are less than or equal to the minimum stack guard region size 8192. At least one access must occur in each such segment.

The first access must occur between SP and SP-4096, because in the absence of more specific information, the previous guaranteed access relative to the current stack may be as much as 4096 bytes greater than the current stack pointer address.

The last access must be within 4096 of the intended new value of the stack pointer. These accesses must occur in order, starting with the highest addressed segment and working toward the lowest addressed segment.

A more optimal strategy is:

Perform a read access using the intended new value of the stack pointer. This is nondestructive, even if the read is beyond the stack guard region, and may facilitate OS mapping of new stack pages, if appropriate, in a single operation.
Proceed with sequential accesses as just described.

Note

A simple algorithm that is consistent with this requirement (but achieves up to twice the minimum number of accesses) is to perform a sequence of accesses in a loop starting with the previous value of SP, decrementing by the minimum no-check extension size (4096) to, but not including, the first value that is less than the new value for the stack pointer.

The stack must not be extended incrementally in procedure prologues. A procedure prologue that needs to extend the stack by an amount of unknown size or known size greater than the minimum implicit check size must test new stack segments as just described in a loop that does not modify SP, and then update the stack with one instruction that copies the new stack pointer value into the SP.

Note

An explicit stack limit check can be performed either by inline code that is part of a prologue or by a run-time support routine that is tailored to be called from a procedure prologue.

Stack Reserve Region Checking

The size of the stack reserve region must be included in the increment size used for stack limit checks, after which it is not included in the amount by which the stack is actually extended. (Depending on the size of the stack reserve region, this may partially or even completely eliminate the ability to use implicit stack limit checking.)

4.6 Register Stack

General registers R32 through R127 form a register stack that is automatically managed across procedure calls and returns. Each procedure frame on the register stack is divided into two dynamically-sized regions: one for input parameters and local variables, and one for output parameters.

On a procedure call, the registers are automatically renamed by the hardware so that the caller's output registers form the base of the register stack frame of the callee. On return, the registers are restored to the previous state, so that the input and local registers are preserved across the call.

The ALLOC instruction is used at the beginning of a procedure to allocate the input, local, and output regions; the sizes of these regions are supplied as immediate operands. A procedure is not required to issue an ALLOC instruction if it does not need to store any values in its register stack frame. It may write to the first N stacked registers, where N is the value of the argument count passed in the argument information (AI) register (see Section 4.7.5.3). It may not write to any other stack register without first issuing an ALLOC instruction.

Figure 4-2 illustrates the operation of the register stack across an example procedure call. In this example, the caller allocates eight input, twelve local, and four output registers; the callee allocates four input, six local, and five output registers with the following instruction:

  ALLOC R36=rspfs, 4, 6, 5, 0

The actual registers to which the stacking registers are physically mapped are not directly addressable by the application software.

4.6.1 Input and Local Registers

The hardware makes no distinction between input and local registers. The caller's output registers automatically become the callee's register stack frame on a procedure call, with all registers initially allocated as output registers. An ALLOC instruction may increase or decrease the total size of the register stack frame, and may adjust the boundary between the input and local region and the output region.

The software conventions specify that up to eight general registers are used for parameter passing. Any registers in the input and local region beyond those eight may be allocated for use as preserved locals. Floating-point parameters may produce holes in the parameter list that is passed in the general registers; those unused input registers may also be used for preserved locals.

The caller's output registers do not need to be preserved for the caller. Once an input parameter is no longer needed, or has been copied elsewhere, that register may be reused for any other purpose within the procedure.

Figure 4-2 Operation of the Register Stack

4.6.2 Output Registers

Up to eight output registers are used for passing parameters. If a procedure call requires fewer than eight general registers for its parameters, the calling procedure does not need to allocate more than are needed. If the called procedure expects more parameters, it will allocate extra input registers; these registers will be uninitialized.

A procedure may also allocate more than eight registers in the output region. While the extra registers may not be used for passing parameters, they can be used as extra scratch registers. On a procedure call, they will show up in the called procedure's output area as excess registers, and may be modified by that procedure. The called procedure may also allocate few enough total registers in its stack frame that the top of the called procedure's frame is lower than the caller's top-of-frame, but those registers will become available again when control returns to the caller.

4.6.3 Rotating Registers

A subset of the registers in the procedure frame may be designated as rotating registers. The rotating register region always starts with R32, and may be any multiple of eight registers in number, up to a maximum of 96 rotating registers. The renaming is under control of the Register Rename Base (RRB).

If the rotating registers include any or all of the output registers, software must be careful when using the output registers for passing parameters, because a non-zero RRB will change the virtual register numbers that are part of the output region. In general, software should ensure either that the rotating region does not overlap the output region, or that the RRB is cleared to zero before setting output parameter registers.

4.6.4 Frame Markers

The current application-visible state of the register stack is stored in an architecturally inaccessible register called the current frame marker. On a procedure call, this register is automatically saved by copying it to an application register, the previous function state (AR.PFS). The current frame marker is modified to describe a new stack frame whose input and local area is initially zero size, and whose output area is equal in size to the previous output area. On return, the previous frame state register is used to restore the current frame marker to its earlier value, and the base of the register stack is adjusted accordingly.

It is the responsibility of a procedure to save the previous function state register before issuing any procedure calls of its own, and to restore it before returning.

4.6.5 Backing Store for Register Stack

When the depth of the procedure call stack exceeds the capacity of the physical register file, the hardware frees physical registers by saving them into a memory stack. This backing store is distinct from the memory stack described in Section 4.5.

As returns unwind the procedure call stack, the hardware also restores previously-saved physical registers from the backing store.

The operation of this register stack engine (RSE) is mostly transparent to application software. While the RSE is running, application software may not examine the contents of the backing store, and may not make any assumptions about how much of the register stack is still in physical registers or in the backing store. In order to examine previous stack frames, application software must synchronize the RSE with the FLUSHRS instruction. Synchronizing the RSE forces all stack frames up to, but not including, the current frame to be saved in backing store, allowing the software to examine the contents of the backing store without asynchronous operations modifying the memory. Modifications to the backing store require setting the RSE to enforced lazy mode after synchronizing it, which prevents the RSE from doing any operations other than those required by calls and returns. The procedure for synchronizing the RSE and setting the mode is described in the Itanium® Software Conventions and Runtime Architecture Guide.

The backing store grows towards higher addresses. The top of the stack, which corresponds to the top of the previous procedure frame, is available in the Backing Store Pointer (BSP) application register. The BSP must always point to a valid backing store address, because the operating system may need to start the RSE to process an exception.

Backing store overflow is automatically detected by the OpenVMS operating system, which will either extend the backing store to allow continued operation or will raise an exception. Unlike for the memory stack (see Section 4.5), there are no specific rules or requirements that must be satisfied to facilitate detection of backing store overflow.

A NaT collection register is stored into the backing store following each group of 63 physical registers. The NaT bit of each register stored is shifted into the collection register. When the BSP reaches the quadword just before a 64-quadword boundary, the RSE stores the collection register. Software can determine the position of the NaT collection registers in the backing store by examining the memory address. This process is described in greater detail in the Intel IA-64 Architecture Software Developer Manual.

4.7 Procedure Linkage

This calling standard states that a standard call (see Section 1.4) can be accomplished in any way that presents the called routine with the required environment. However, typically, most standard-conforming external calls are implemented with a common sequence of instructions and conventions. Because a common set of call conventions is so pervasive, these conventions are included for reference as part of this standard.

4.7.1 The GP Register

Every procedure that references statically-allocated data or calls another procedure requires a pointer to an associated short data segment in the GP register, so that it can access its static data and its linkage tables. Typically, an image has one such data segment, and the GP register must be set correctly prior to calling any entry point within that image. Optionally, an image may be partitioned into subcomponents called clusters in which case each cluster may have its own associated data segment (clusters may also share a common data segment). For further information on images and clusters, see the HP OpenVMS Linker Utility Manual.

Throughout this chapter, rules regarding the use of the GP register are described in terms of images. However, these same rules apply between clusters within an image (keeping in mind that clusters within an image may share a common GP address and short data segment, while images cannot share a common GP address and short data segment).

The linkage conventions require that each image (or cluster) define exactly one GP value to refer to a location within its short data segment. This location should be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries. The image activator determines the absolute value of the GP register for each image after loading its data segment into memory.

Because the GP register remains unchanged for calls within an image, calls known to be local can be optimized accordingly. For calls between images, the GP register must be initialized with the correct GP value for the new image, and the calling function must ensure that its own GP value is saved and restored.

Note that there is a small set of compiler run-time support procedures that take a special pseudo-GP value as a kind of input parameter. See Section 4.7.7 for more information about support for bound function descriptors. See Section 5.1.2 for information about support for translated images.

4.7.2 Types of Calls

The following types of procedure calls are defined:

Direct local calls. Direct calls within the same image can be made directly to the entry point of the target procedure. In this case, the GP register does not need to be changed.
Direct non-local calls. Calls made outside the same image are routed through an import stub (which can be inlined at compile time if the call is known or suspected to be to another image). The import stub obtains the address of the main entry point and the GP register value from the linkage table. Although coded in source as a direct call, a dynamically-linked call therefore becomes indirect.
Indirect calls. A function pointer points to a descriptor that contains both the address of the function entry point and the GP register value for the target function. The compiler must generate code for an indirect call that sets the new GP value before transferring control to the target procedure.
Special calls. Other special calling conventions are allowed to the extent that the compiler and the run-time library agree on the conventions, and provided that the stack can be unwound through such a call. Such calls are outside the scope of this document. See Section A.3.1 for a discussion of stack unwind requirements.

4.7.3 Calling Sequence

Direct and indirect procedure calls are described in the following sections. Because the compiler is not required to know whether any given call is local or to a dynamically linked image, the two types of direct calls are described together in Section 4.7.3.1.

4.7.3.1 Direct Calls

Direct procedure calls follow the sequence of steps shown in Figure 4-3. The following paragraphs describe these steps in detail.

Figure 4-3 Direct Procedure Calls

Caller: Prepare call. Values in scratch registers that must be kept live across the call must be saved. They can be saved by copying them into local stacked registers, or by saving them on the memory stack. If the NaT bits associated with any live scratch registers must be saved, the compiler should use ST8.SPILL or STF.SPILL instructions. The User NaT collection register itself is preserved by the call, so the NaT bits need no further treatment at this point.
If the call is not known (at compile time) to be within the same image, the GP register must be saved.
The parameters must be set up in registers and memory as described in Section 4.7.4
Caller: Call. All direct calls are made with a BR.CALL instruction, specifying B0 for the return link.
For direct local calls, the PC-relative displacement is computed at link time. Compilers may assume that the standard displacement field in the BR.CALL instruction is sufficiently wide to reach the target of the call. If the displacement is too large, the linker must supply a branch stub at some convenient point in the code; compilers must guarantee the existence of such a point by ensuring that code sections in the relocatable object files are no larger than the maximum reach of the BR.CALL instruction. With a 25-bit displacement, the maximum reach is 16 megabytes in either direction from the point of call.
Because direct calls to other images cannot be statically bound at link time, the linker must supply an import stub for the target procedure; the import stub obtains the address of the target procedure from the linkage table. The BR.CALL instruction can then be statically bound to the import stub using the PC-relative displacement.
The BR.CALL instruction performs the following actions:
- Saves the return link in the return branch register
- Saves the current frame marker in the AR.PFS register
- Sets the base of the new register stack frame to the beginning of the output region of the old frame
Caller: Import stub (direct non-local calls only). The import stub is allocated in the image of the caller, so that the BR.CALL instruction can be statically bound to the address of the import stub. It must access the linkage table via the current GP (which means that GP must be valid at the point of call), and obtain the address of the target procedure's entry point and its GP value. The import stub then establishes the new GP value and branches to the target entry point.
If the compiler knows or suspects that the target of a call is in a separate image, it can generate calling code that performs the functions of the import stub, which saves an extra branch.
When the target of a call is in the same image, an import stub is not used (which also means that GP must be valid at the point of call).
Callee: Entry. The prologue code in the target procedure is responsible for allocating the register stack frame. It is also responsible for allocating a frame on the memory stack when necessary. It may use the 16 bytes at the top of its caller's stack frame as a scratch area.
A non-leaf procedure must save the return branch register and previous function state, either in the memory stack frame or in a local stacked general register.
The prologue must also save any preserved registers to be used in this procedure. The NaT bits for those registers must be preserved as well, by copying the NaT bits to local stacked general registers, or by using ST8.SPILL or STF.SPILL instructions. However, the User NaT collection register (AR.UNAT) must be saved first because it is guaranteed to be preserved by the call.
Callee: Exit. The epilogue code is responsible for restoring the return branch register and previous function state, if necessary, and any preserved registers that were saved. The NaT bits must be restored using the LD8.FILL or LDF.FILL instructions. The User NaT collection register must also be restored if it was saved.
If a memory stack frame was allocated, the epilogue code must deallocate it.
Finally, the procedure exits by branching through the return branch register with the BR.RET instruction.
Caller: After the call. Any saved values (including GP) should be restored.

Contents

Index