HP OpenVMS Systems Documentation |
Guide to the POSIX Threads Library
3.4.3 Diagnosing Stack Overflow ErrorsA process can produce a memory access violation (or segmentation fault) when it overflows its stack. As a first step in debugging this behavior, it is often necessary to run the program under the control of your system's debugger to determine which thread's stack has overflowed. However, if the debugger shares resources with the target process (as under OpenVMS), perhaps allocating its own data objects on the target process' stack, the debugger might not operate properly when the stack overflows. In this case, you might be required to analyze the target process by means other than the debugger. If a thread receives a memory access exception either during a routine call or when accessing a local variable, increase the size of the thread's stack. However, not all memory access violations indicate a stack overflow. For programs that you cannot run under a debugger, determining a stack overflow is more difficult. This is especially true if the program continues to run after receiving a memory access exception. For example, if a stack overflow occurs while a mutex is locked, the mutex might not be released as the thread recovers or terminates. When the program attempts to lock that mutex again, it could hang.
To set the stacksize attribute in a thread attributes object, use the
pthread_attr_setstacksize()
routine. (See Section 2.3.2.4 for more information.)
The scheduling attributes of threads have unique programming issues.
Use care when writing code that uses real-time scheduling (such as FIFO and RR policies) to control the priority of threads:
3.5.2 Priority InversionPriority inversion occurs when the interaction among a group of three or more threads causes that group's highest-priority thread to be blocked from executing. For example, a higher-priority thread waits for a resource locked by a low-priority thread, and the low-priority thread waits while a middle-priority thread executes. The higher-priority thread is made to wait while a thread of lower priority (the middle-priority thread) executes. You can address the phenomenon of priority inversion as follows:
3.5.3 Dependencies Among Scheduling Attributes and Contention ScopeOn Tru64 UNIX systems, to use high (real-time) thread scheduling priorities, a thread with system contention scope must run in a process with root privileges. On the other hand, a thread with process contention scope has access to all levels of priority without requiring special privileges.
Thus, if a process that is not privileged attempts to create another
high priority thread with system contention scope, the creation will
fail.
The following sections discuss how to determine when to use a mutex with or without a condition variable, and how to prevent two erroneous behaviors that are common in multithreaded programs: race conditions and deadlocks.
Also discussed is why you should signal a condition variable with the
associated mutex locked.
Use a mutex for tasks with short duration waits and fine-grained synchronization (memory access). Examples of a "fine-grained" task are those that serialize access to shared memory or make simple modifications to shared memory. This typically corresponds to a critical section of a few program statements or less. Mutex waits are not interruptible. Threads waiting to acquire a mutex cannot be canceled. Use a condition variable to wait for data to assume a desired state. Condition variables should be used for tasks with longer duration waits and coarse-grained synchronization (routine and system calls) Always use a condition variable with a mutex that protects the shared data being waited for. Condition variable waits are interruptible.
See Section 2.4.1 and Section 2.4.2 for more information about mutexes
and condition variables.
A race condition occurs when two or more threads perform an operation and the result of the operation depends on unpredictable timing factors, specifically, the points at which each thread executes and waits and the point when each thread completes the operation. For example, if two threads execute routines and each increments the same variable (such as x = x + 1), the variable could be incremented twice and one of the threads could use the wrong value. For example:
Race conditions result from the lack of (or ineffectual) synchronization. To avoid race conditions, ensure that any variable modified by more than one thread has only one mutex associated with it, and ensure that all accesses to the variable are made after acquiring that mutex. You can also use hardware features such as Alpha land-locked/store-conditional instruction sequences.
See Section 3.6.4 for another example of a race condition.
A deadlock occurs when a thread holding a resource is waiting for a resource held by another thread, while that thread is also waiting for the first thread's resource. Any number of threads can be involved in a deadlock if there is at least one resource per thread. A thread can deadlock on itself. Other threads can also become blocked waiting for resources involved in the deadlock. Following are three techniques you can use to avoid deadlocks:
3.6.4 Signaling a Condition VariableSignaling the condition variable while holding the lock allows the Threads Library to perform certain optimizations which can result in more efficient behaviors in the working thread. In addition, doing so resolves a race condition which results if that signal might cause the condition variable to be deleted. The following C code fragment is executed by a releasing thread (Thread A) to wake a blocked thread:
The following C code fragment is executed by a potentially blocking thread (thread B):
These code fragments also demonstrate a race condition; that is, the routine, as coded, depends on a sequence of events among multiple threads, but does not enforce the desired sequence. Signaling the condition variable while still holding the associated mutex eliminates the race condition. Doing so prevents thread B from deleting the condition variable until after thread A has signaled it. This problem can occur when the releasing thread is a worker thread and the waiting thread is a boss thread, and the last worker thread tells the boss thread to delete the variables that are being shared by boss and worker. Code the signaling of a condition variable with the mutex locked as follows:
3.6.5 Static Initialization Inappropriate for Stack-Based Synchronization ObjectsAlthough it is acceptable to the compiler, you cannot use the following standard macros (or any other equivalent mechanism) to initialize synchronization objects that are allocated on the stack: PTHREAD_MUTEX_INITIALIZER The Threads Library detects some cases of misuse of static initialization of automatically allocated (stack-based) thread synchronization objects. For instance, if the thread on whose stack a statically initialized mutex is allocated attempts to access that mutex, the operation fails and returns [EINVAL]. If the application does not check status returns from Threads Library routines, this failure can remain unidentified. Further, if the operation was a call to pthread_mutex_lock() , the program can encounter a thread synchronization failure, which in turn can result in unexpected program behavior including memory corruption. (For performance reasons, the Threads Library does not currently detect this error when a statically initialized mutex is accessed by a thread other than the one on whose stack the object was automatically allocated.)
If your application must allocate a thread synchronization object on
the stack, the application must initialize the object before it is used
by calling one of the routines
pthread_mutex_init()
,
pthread_cond_init()
, or
pthread_rwlock_init()
, as appropriate for the object. Your application must also destroy the
thread synchronization object before it goes out of scope (for
instance, due to the routine's returning control or raising an
exception) by calling one of the routines
pthread_mutex_destroy()
,
pthread_cond_destroy()
, or
pthread_rwlock_destroy()
, as appropriate for the object.
Granularity refers to the smallest unit of storage (that is, bytes, words, longwords, or quadwords) that a host computer can load or store in one machine instruction. Granularity considerations can affect the correctness of a program in which concurrent or asynchronous access can occur to separate pieces of data stored in the same memory granule. This can occur in a multithreaded program, where different threads access the data, or in any program that has any of the following characteristics:
The subsections that follow explain the granularity concept, the way it
can affect the correctness of a multithreaded program, and techniques
the programmer can use to prevent the granularity-related race
condition known as word tearing.
A computer's processor typically makes available some set of granularities to programs, based on the processor's architecture, cache architecture, and instruction set. However, the computer's natural granularity also depends on the organization of the computer's memory and its bus architecture. For example, even if the processor "naturally" reads and writes 8-bit memory granules, a program's memory transfers may, in fact, occur in 32- or 64-bit memory granules. On a computer that supports a set of granularities, the compiler determines a given program's actual granularity by the instructions it produces for the program to execute. For example, a given compiler on Alpha systems might generate code that causes every memory access to load or store a 64-bit word, regardless of the size of the data object specified in the application's source code. In this case, the application has a 64-bit word actual granularity. For this application, 8-bit, 16-bit, and 32-bit writes are not atomic with respect to other memory operations that overlap the same 64-bit memory granule. To provide a run-time environment for applications that is consistent and coherent, an operating system's services and libraries should be built so that they provide the same actual granularity. When this is the case, an operating system can be said to provide a system granularity to the applications that it hosts. (A system's system granularity is typically reflected in the default actual granularity that the system's compilers encode when producing an object file.)
When preparing to port a multithreaded application from one system to
another, you should determine whether there is a difference in the
system granularities between the source and target systems. If the
target system has a larger system granularity than the source system,
you should become informed about the programming techniques presented
in the sections that follow.
Systems based on the Alpha processor family have a quadword (64-bit) natural granularity. Versions EV4 and EV5 of the Alpha processor family provide instructions for only longword- and quadword-length atomic memory accesses. Newer Alpha processors (EV5.6 and later) support byte- and word-length atomic memory accesses as well as longword- and quadword-length atomic memory accesses. (However, there is no way to ensure that a compiler uses the byte or word memory references when generating code for your application.)
On Tru64 UNIX systems, use the
/usr/sbin/psrinfo -v
command to determine the version(s) of your system's Alpha processor(s).
Systems based on the VAX processor family have longword (32-bit) natural granularity, but all instructions can access unaligned data safely (though perhaps with a substantial performance penalty).
For more information about the granularity considerations of porting an
application from an OpenVMS VAX system to an OpenVMS Alpha systems,
consult the document Migrating to an OpenVMS
System1.
Table 3-1 summarizes the actual granularities that are provided by the respective compilers on the respective Compaq platforms.
Of course, for compilers that support an optional granularity setting,
it is possible to compile different modules in your application with
different granularity settings. You might do so either to avoid the
possibility of word-tearing race condition, as described in
Section 3.7.3, or to improve the application's performance.
In a multithreaded application, concurrent access by different threads to data that occupy the same memory granule can lead to a race condition known as word tearing. This situation occurs when two or more threads independently read the same granule of memory, update different portions of that granule, then independently (that is, asynchronously) store their respective copies of that granule. Because the order of the store operations is indeterminate, it is possible that only the last thread to write the granule continues with a correct "view" of the granule's contents, and earlier writes could be "undone". In a multithreaded program the potential for a word-tearing race condition exists only when both of the following conditions are met:
For instance, given a multithreaded program that has been compiled to
have longword actual granularity, if any two of the program's threads
can concurrently update different bytes or words in the same longword,
then that program is, in theory, at risk for encountering a
word-tearing race condition. However, in practice, language-defined
restrictions on the alignments of data may limit the actual number of
candidates for a word-tearing scenario, as described in the next
section.
The only data objects that are candidates for participating in a word-tearing race condition are members of composite data objects---that is, C language structures, unions, and arrays. In other words, the application's threads might access different data objects that are members of structures or unions, where those members occupy the same byte, word, longword, or quadword. Similarly, the application might access arrays whose elements occupy the same word, longword, or quadword. On the other hand, the C language specification allows the compiler to allocate scalar data objects so that each is aligned on a boundary for the memory granule that the compiler prefers, as follows:
For the details of the compiler's rules for aligning scalar and composite data objects, see the Compaq C and C++ compiler documentation for your application's host platforms.
|