![]() |
![]() HP OpenVMS Systems Documentation |
![]() |
HP OpenVMS Calling Standard
4.5.2 Stack Overflow DetectionThis section defines the conventions to support the execution of multiple threads in a multilanguage OpenVMS environment. Specifically defined is how compiled code must perform stack limit checking. While this standard is compatible with a multithreaded execution environment, the detailed mechanisms, data structures, and procedures that support this capability are not specified in this manual. For a multithreaded environment, the following characteristics are assumed:
4.5.2.1 Stack Limit CheckingA program that is otherwise correct can fail because of stack overflow. Stack overflow occurs when extension of the stack (by decrementing the stack pointer, SP) allocates addresses not currently reserved for the current thread's stack. This section defines the conventions for stack limit checking in a multithreaded environment. In the following sections, the term new stack region refers to the region of the stack from one less than the old value of SP to the new value of SP. In a multithreaded environment, the address space beyond each thread's stack is protected by contiguous guard pages, which trap on any access. These pages form the stack guard region. In some cases, it is useful to maintain a stack reserve region, which is a minimum-sized region that is between the current top of stack and the stack guard region. A stack reserve region can ensure that the following conditions exist:
This calling standard does not require a stack reserve region, but it
does allow a language (for example, Ada) and its run-time system to
implement one.
Because accessible memory may be available at addresses lower than those occupied by the stack guard region, compilers must generate code that never extends the stack past the stack guard region into accessible memory that is not allocated to the thread's stack. A general strategy to prevent extending the stack past the stack guard region is to access each page of memory down to and possibly including the page corresponding to the intended new value of the SP. If the stack is to be extended by an amount larger than the size of a memory page, then a series of accesses is required that works from higher to lower addressed pages. If any access results in a memory access violation, then the code has made an invalid attempt to extend the stack of the current thread. This calling standard defines two methods for stack limit checking, implicit and explicit, which are explained in the following sections. If a byte (not necessarily the lowest) of the new stack region is guaranteed to be accessed prior to any further stack extension, then the stack can be extended by an increment that is up to one-half the stack guard region (without any additional accesses). This standard requires that the minimum stack guard region size is 8192 bytes. If the stack is being extended by 4096 bytes or less and the application does not use a stack reserve region, then explicit checking is not required. However, because asynchronous interrupts and calls to other procedures may also cause stack extension without explicit checking, stack extension with implicit checking must adhere to the following rules:
These conventions ensure that the stack pointer is not decremented so that it points to accessible storage beyond the stack limit without this error being detected (either by the guard region being accessed by the thread or by an explicit stack limit check failure). As a matter of practice, the system can provide multiple guard pages in the stack guard region. When a stack overflow is detected as a result of access to the stack guard region, one or more guard pages can be unprotected for use by the exception-handling facility, as long as one or more guard pages remain protected to provide implicit stack limit checking during exception processing. If the stack is being extended by an unknown amount or by a known amount that is greater than the maximum implicit check size 4096, then a code sequence that follows the rules for implicit stack limit checking can be executed in a loop to access the new stack region incrementally in segments that are less than or equal to the minimum stack guard region size 8192. At least one access must occur in each such segment. The first access must occur between SP and SP-4096, because in the absence of more specific information, the previous guaranteed access relative to the current stack may be as much as 4096 bytes greater than the current stack pointer address. The last access must be within 4096 of the intended new value of the stack pointer. These accesses must occur in order, starting with the highest addressed segment and working toward the lowest addressed segment. A more optimal strategy is:
The stack must not be extended incrementally in procedure prologues. A procedure prologue that needs to extend the stack by an amount of unknown size or known size greater than the minimum implicit check size must test new stack segments as just described in a loop that does not modify SP, and then update the stack with one instruction that copies the new stack pointer value into the SP.
The size of the stack reserve region must be included in the increment
size used for stack limit checks, after which it is not included in the
amount by which the stack is actually extended. (Depending on the size
of the stack reserve region, this may partially or even completely
eliminate the ability to use implicit stack limit checking.)
General registers R32 through R127 form a register stack that is automatically managed across procedure calls and returns. Each procedure frame on the register stack is divided into two dynamically-sized regions: one for input parameters and local variables, and one for output parameters. On a procedure call, the registers are automatically renamed by the hardware so that the caller's output registers form the base of the register stack frame of the callee. On return, the registers are restored to the previous state, so that the input and local registers are preserved across the call. The ALLOC instruction is used at the beginning of a procedure to allocate the input, local, and output regions; the sizes of these regions are supplied as immediate operands. A procedure is not required to issue an ALLOC instruction if it does not need to store any values in its register stack frame. It may write to the first N stacked registers, where N is the value of the argument count passed in the argument information (AI) register (see Section 4.7.5.3). It may not write to any other stack register without first issuing an ALLOC instruction. Figure 4-2 illustrates the operation of the register stack across an example procedure call. In this example, the caller allocates eight input, twelve local, and four output registers; the callee allocates four input, six local, and five output registers with the following instruction:
The actual registers to which the stacking registers are physically
mapped are not directly addressable by the application software.
The hardware makes no distinction between input and local registers. The caller's output registers automatically become the callee's register stack frame on a procedure call, with all registers initially allocated as output registers. An ALLOC instruction may increase or decrease the total size of the register stack frame, and may adjust the boundary between the input and local region and the output region. The software conventions specify that up to eight general registers are used for parameter passing. Any registers in the input and local region beyond those eight may be allocated for use as preserved locals. Floating-point parameters may produce holes in the parameter list that is passed in the general registers; those unused input registers may also be used for preserved locals. The caller's output registers do not need to be preserved for the caller. Once an input parameter is no longer needed, or has been copied elsewhere, that register may be reused for any other purpose within the procedure. Figure 4-2 Operation of the Register Stack ![]() 4.6.2 Output RegistersUp to eight output registers are used for passing parameters. If a procedure call requires fewer than eight general registers for its parameters, the calling procedure does not need to allocate more than are needed. If the called procedure expects more parameters, it will allocate extra input registers; these registers will be uninitialized.
A procedure may also allocate more than eight registers in the output
region. While the extra registers may not be used for passing
parameters, they can be used as extra scratch registers. On a procedure
call, they will show up in the called procedure's output area as excess
registers, and may be modified by that procedure. The called procedure
may also allocate few enough total registers in its stack frame that
the top of the called procedure's frame is lower than the caller's
top-of-frame, but those registers will become available again when
control returns to the caller.
A subset of the registers in the procedure frame may be designated as rotating registers. The rotating register region always starts with R32, and may be any multiple of eight registers in number, up to a maximum of 96 rotating registers. The renaming is under control of the Register Rename Base (RRB).
If the rotating registers include any or all of the output registers,
software must be careful when using the output registers for passing
parameters, because a non-zero RRB will change the virtual register
numbers that are part of the output region. In general, software should
ensure either that the rotating region does not overlap the output
region, or that the RRB is cleared to zero before setting output
parameter registers.
The current application-visible state of the register stack is stored in an architecturally inaccessible register called the current frame marker. On a procedure call, this register is automatically saved by copying it to an application register, the previous function state (AR.PFS). The current frame marker is modified to describe a new stack frame whose input and local area is initially zero size, and whose output area is equal in size to the previous output area. On return, the previous frame state register is used to restore the current frame marker to its earlier value, and the base of the register stack is adjusted accordingly.
It is the responsibility of a procedure to save the previous function
state register before issuing any procedure calls of its own, and to
restore it before returning.
When the depth of the procedure call stack exceeds the capacity of the physical register file, the hardware frees physical registers by saving them into a memory stack. This backing store is distinct from the memory stack described in Section 4.5. As returns unwind the procedure call stack, the hardware also restores previously-saved physical registers from the backing store. The operation of this register stack engine (RSE) is mostly transparent to application software. While the RSE is running, application software may not examine the contents of the backing store, and may not make any assumptions about how much of the register stack is still in physical registers or in the backing store. In order to examine previous stack frames, application software must synchronize the RSE with the FLUSHRS instruction. Synchronizing the RSE forces all stack frames up to, but not including, the current frame to be saved in backing store, allowing the software to examine the contents of the backing store without asynchronous operations modifying the memory. Modifications to the backing store require setting the RSE to enforced lazy mode after synchronizing it, which prevents the RSE from doing any operations other than those required by calls and returns. The procedure for synchronizing the RSE and setting the mode is described in the Itanium® Software Conventions and Runtime Architecture Guide. The backing store grows towards higher addresses. The top of the stack, which corresponds to the top of the previous procedure frame, is available in the Backing Store Pointer (BSP) application register. The BSP must always point to a valid backing store address, because the operating system may need to start the RSE to process an exception. Backing store overflow is automatically detected by the OpenVMS operating system, which will either extend the backing store to allow continued operation or will raise an exception. Unlike for the memory stack (see Section 4.5), there are no specific rules or requirements that must be satisfied to facilitate detection of backing store overflow.
A NaT collection register is stored into the backing store following
each group of 63 physical registers. The NaT bit of each register
stored is shifted into the collection register. When the BSP reaches
the quadword just before a 64-quadword boundary, the RSE stores the
collection register. Software can determine the position of the NaT
collection registers in the backing store by examining the memory
address. This process is described in greater detail in the Intel
IA-64 Architecture Software Developer Manual.
This calling standard states that a standard call (see Section 1.4)
can be accomplished in any way that presents the called routine with
the required environment. However, typically, most standard-conforming
external calls are implemented with a common sequence of instructions
and conventions. Because a common set of call conventions is so
pervasive, these conventions are included for reference as part of this
standard.
Every procedure that references statically-allocated data or calls another procedure requires a pointer to an associated short data segment in the GP register, so that it can access its static data and its linkage tables. Typically, an image has one such data segment, and the GP register must be set correctly prior to calling any entry point within that image. Optionally, an image may be partitioned into subcomponents called clusters in which case each cluster may have its own associated data segment (clusters may also share a common data segment). For further information on images and clusters, see the HP OpenVMS Linker Utility Manual. Throughout this chapter, rules regarding the use of the GP register are described in terms of images. However, these same rules apply between clusters within an image (keeping in mind that clusters within an image may share a common GP address and short data segment, while images cannot share a common GP address and short data segment). The linkage conventions require that each image (or cluster) define exactly one GP value to refer to a location within its short data segment. This location should be chosen to maximize the usefulness of short-displacement immediate instructions for addressing scalars and linkage table entries. The image activator determines the absolute value of the GP register for each image after loading its data segment into memory. Because the GP register remains unchanged for calls within an image, calls known to be local can be optimized accordingly. For calls between images, the GP register must be initialized with the correct GP value for the new image, and the calling function must ensure that its own GP value is saved and restored.
Note that there is a small set of compiler run-time support procedures
that take a special pseudo-GP value as a kind of input parameter. See
Section 4.7.7 for more information about support for bound function
descriptors. See Section 5.1.2 for information about support for
translated images.
The following types of procedure calls are defined:
4.7.3 Calling Sequence
Direct and indirect procedure calls are described in the following
sections. Because the compiler is not required to know whether any
given call is local or to a dynamically linked image, the two types of
direct calls are described together in Section 4.7.3.1.
Direct procedure calls follow the sequence of steps shown in Figure 4-3. The following paragraphs describe these steps in detail. Figure 4-3 Direct Procedure Calls ![]()
|