HP OpenVMS Systems Documentation

Guide to the POSIX Threads Library

2.5 Process-Shared Synchronization Objects

You can create synchronization objects (that is, mutexes, condition variables, and read-write locks) that protect data that is shared among threads running in different processes. These are called process-shared synchronization objects.

Note

This version of the Threads Library supports process-shared synchronization objects for Tru64 UNIX systems only.

The following routines are not supported on OpenVMS Alpha and OpenVMS VAX systems:

pthread_condattr_getpshared()
pthread_condattr_setpshared()
pthread_mutexattr_getpshared()
pthread_mutexattr_setpshared()
pthread_rwlockattr_getpshared()
pthread_rwlockattr_setpshared()

2.5.1 Programming Considerations

On Tru64 UNIX systems, a process-shared synchronization object is a kernel object. Performing any operation on such an object requires a call into the kernel and thus is of higher cost than the same operation on a process-specific synchronization object.

When debugging a process-shared synchronization object, the debugger cannot currently display the mutex, nor its owner or waiting threads.

As is the case for process-specific synchronization objects, a process-shared synchronization object must be initialized only once; you cannot initialize it in each process that uses it. For independent processes that share a common synchronization protocol using process-shared synchronization objects, there must be some mechanism to determine which single process will initialize those objects.

For example, if multiple processes connect to a named memory section, all but one will fail, and the one successful process should have the responsibility of initializing any global process-shared synchronization objects in that memory section. (The other processes must also use some mechanism for waiting until the process-shared object is initialized before attempting to use the shared memory section.)

2.5.2 Process-Shared Mutexes

You can create a mutex that protects data that is shared among threads running in different processes. This is called a process-shared mutex.

Create a process-shared mutex by using the pthread_mutexattr_setpshared() routine to set the process-shared attribute in an initialized mutex attributes object and then use that attributes object in a call to pthread_mutex_init() .

2.5.3 Process-Shared Condition Variables

You can create a condition variable used to communicate changes to data that is shared among threads running in different processes. This is called a process-shared condition variable.

Create a process-shared condition variable by using the
pthread_condattr_setpshared() routine to set the process-shared attribute in an initialized condition variable attributes object and then use that attributes object in a call to pthread_cond_init() .

2.5.4 Process-Shared Read-Write Locks

You can create a read-write lock that protects data that is shared among threads running in different processes. This is called a process-shared read-write lock.

Create a process-shared read-write lock by using the
pthread_rwlockattr_setpshared() routine to set the process-shared attribute in an initialized read-write lock attributes object. Then use that attributes object in a call to pthread_rwlock_init() .

2.6 Thread-Specific Data

Each thread can use an area of memory private to the Threads Library where it stores thread-specific data. Use this memory to associate arbitrary data with a thread's context. This allows you to add user-specified fields to the current thread's context or define global variables that have private values in each thread.

A thread-specific data key is shared by all threads within the process---each thread has its own unique value for that shared key.

Use the following routines to create and access thread-specific data:

Use the pthread_key_create() routine to create a unique key value. One call to pthread_key_create() creates a thread-specific data key shared by all threads. In addition, your program can specify a destructor routine to destroy the context value associated with this key when any thread terminates.
The process must create each key exactly once; otherwise, subsequent creates will overwrite the first. See Section 3.8 for information about the one-time initialization in a threaded environment, or use the operating system's initialization mechanisms (the Tru64 UNIX __init_* routines or the OpenVMS LIB$INITIALIZE routine).
Use pthread_setspecific() to associate thread-specific data values with a key value. Each thread can associate its own private data with the shared key.
For example, each thread might store a pointer to a block of dynamically allocated memory that it has reserved. Although each thread has its own block of memory, your code always uses the same key to get the current thread's block.
Use pthread_getspecific() to obtain the data associated with a key. This routine obtains the current thread's thread-specific data value associated with a specified key.

Chapter 3
Programming with Threads

This chapter discusses programming disciplines that you should follow as you use Threads Library routines in your programs. Pertinent examples include programming for asynchronous execution, choosing a synchronization mechanism, avoiding priority scheduling problems, making code thread-safe, and working with code that is not thread-safe.

3.1 Designing Code for Asynchronous Execution

When programming with threads, always keep in mind that the execution of a thread is inherently asynchronous with respect to other threads running in the system (or in the process).

In short, there is no guarantee of when a thread will start. It can start immediately or not for a significant period of time, depending on the priority of the thread in relation to other threads that are currently running. When a thread will start can also depend on the behavior of other processes, as well as on other threaded subsystems within the current process.

You cannot depend upon any synchronization between two threads unless you explicitly code that synchronization into your program using one of the following:

Mutexes or read-write locks
A properly tested application predicate loop on a condition variable
A call to join with a thread you expect to terminate
An operating system synchronization mechanism, such as a file system read, an OpenVMS event flag wait, or a Tru64 UNIX semaphore wait
An equivalent hardware construct, such as VAX interlocked instructions or Alpha load locked/store conditional sequences and memory barriers

On a uniprocessor, the Threads Library, in most cases, will context-switch threads in user mode, within a single operating system process. (This is true except for system contention scope threads on Tru64 UNIX.) Context switches between such threads occur only at relatively determinate times, such as when you make a blocking call to the threads library or when a timeslice interrupt occurs. This behavior might be termed "slightly asynchronous," because such a library tolerates many classes of errors in your application.

On a multiprocessor system, the Threads Library may run more than one application thread simultaneously. Many incautious programming techniques that will not usually cause trouble on a uniprocessor will cause trouble--often in ways that are difficult to isolate and fix--on a multiprocessor.

The following subsections present examples of programming errors.

3.1.1 Avoid Passing Stack Local Data

Avoid creating a thread with an argument that points to stack local data, or to global or static data that is serially reused for a sequence of threads.

Specifically, the thread started with a pointer to stack local data may not start until the creating thread's routine has returned, and the storage may have been changed by other calls. The thread started with a pointer to global or static data may not start until the storage has been reused to create another thread.

3.1.2 Initialize Objects Before Thread Creation

Initialize objects (such as mutexes) or global data that a thread uses before creating that thread.

On "slightly asynchronous" uniprocessor systems this may seem safe, because the thread will probably not run until the creator blocks. Thus, the error can go undetected initially. On a multiprocessor, or even on a new release of the Threads Library with different timeslicing behavior, the created thread may run immediately, before the data has been initialized. This can lead to failures that are difficult to detect. Note that a thread may run to completion, before the call that created it returns to the creator. The system load may affect the timing as well.

Before your program creates a thread, it should set up all requirements that the new thread needs in order to execute. For example, if your program must set the new thread's scheduling parameters, do so with attributes objects when you create it, rather than trying to use pthread_setschedparam() or other routines afterwards. To set global data for the new thread or to create synchronization objects, do so before you create the thread, else set them in a pthread_once() initialization routine that is called from each thread.

3.1.3 Do Not Use Scheduling As Synchronization

Avoid using the scheduling policy and scheduling priority attributes of threads as a synchronization mechanism.

In a uniprocessor system, only one thread can run at a time, and since a higher-priority thread cannot be preempted by a lower-priority running thread, a thread running at higher priority might erroneously be presumed not to need a mutex to access shared data.

On a multiprocessor system, higher- and lower-priority threads are likely to run at the same time. Situations can even arise where higher-priority threads are waiting to run while the threads that are running have a lower priority.

Regardless of whether your code will run only on a uniprocessor implementation, never try to use scheduling as a synchronization mechanism. Even on a uniprocessor system, your SCHED_FIFO thread can become blocked on a mutex (perhaps in a called library routine), on an I/O operation, or even a page fault. Any of these might allow a lower priority thread to run.

3.2 Memory Synchronization Between Threads

Your multithreaded program must ensure that access to data shared between threads is synchronized with the system's memory subsystem. While any written data will, eventually, be seen by other threads, it is essential for communication that some writes appear in a particular sequence. For example, you want a thread that follows a queue link to see the data written to the next queue entry. This requires explicit memory synchronization.

The POSIX standard requires that, when calling the following routines, a thread synchronizes its memory access with respect to other threads:

`fork()`	`pthread_cond_signal()`
`pthread_create()`	`pthread_cond_broadcast()`
`pthread_join()`	`sem_post()`
`pthread_mutex_lock()`	`sem_trywait()`
`pthread_mutex_trylock()`	`sem_wait()`
`pthread_mutex_unlock()`	`semop()`
`pthread_cond_wait()`	`wait()`
`pthread_cond_timedwait()`	`waitpid()`
`pthread_rwlock_*()`
`$HIBER`
`$WAKE`
`$WAIT*`

If a call to one of these routines returns an error, synchronization is not guaranteed. For example, an unsuccessful call to pthread_mutex_trylock() does not necessarily provide actual synchronization.

Synchronization is a "protocol" among cooperating threads, not a single operation. That is, unlocking a mutex does not guarantee memory synchronization with all other threads---only with threads that later perform some synchronization operation themselves, such as locking a mutex.

3.3 Sharing Memory Between Threads

Most threads do not operate independently. They cooperate to accomplish a task, and cooperation requires communication. There are many ways that threads can communicate, and which method is most appropriate depends on the task.

Threads that cooperate only rarely (for example, a boss thread that only sends off a request for workers to do long tasks) may be satisfied with a relatively slow form of communication. Threads that must cooperate more closely (for example, a set of threads performing a parallelized matrix operation) need fast communication---maybe even to the extent of using machine-specific hardware operations.

Most mechanisms for thread communication involve the use of memory, exploiting the fact that all threads within a process share their full address space. Although all addresses are shared, there are three kinds of memory that are characteristically used for communication. The following sections describe the scope (or, the range of locations in the program where code can access the memory) and lifetime (or, the length of time use of the memory is invalid) of each of the three types of memory.

3.3.1 Using Static Memory

Static memory is allocated by the language compiler when it translates source code, so the scope is controlled by the rules of the compiler. For example, in the C language, a variable declared as extern is shared by all scopes where the name is defined anywhere, and a static variable is private to the source file or routine, depending on where it is declared.

In this discussion, static memory is not the same as the C language static storage class. Rather, static memory refers to any variable that is permanently allocated at a particular address for the life of the program.

It is appropriate to use static memory in your multithreaded program when you know that only one instance of an object exists throughout the application. For example, if you want to keep a list of active contexts or a mutex to control some shared resource, you would not want individual threads to have their own copies of that data.

The scope of static memory depends on your programming language's scoping rules. The lifetime of static memory is the life of the program.

3.3.2 Using Stack Memory

Stack memory is allocated by code generated by the language compiler at run time, generally when a routine is initially called. When the program returns from the routine, the storage ceases to be valid (although the addresses still exist and might be accessible).

Generally, the storage is valid while the routine runs, and the actual address can be calculated and passed to other threads; however, this depends on programming language rules. If you pass the address of stack memory to another thread, you must ensure that all other threads are finished processing that data before the routine returns; otherwise the stack will be cleared, and values might be altered by subsequent calls, page fault handling, or other interrupts. The other threads will not be able to determine that this has happened, and erroneous behavior will result.

The scope of stack memory is the routine or a block within the routine. The lifetime is no longer than the time during which the routine or block executes.

3.3.3 Using Dynamic Memory

Dynamic memory is allocated by the program as a result of a call to some memory management routine (for example, the C language run-time routine malloc() or the OpenVMS common run-time routine LIB$GET_VM).

Dynamic memory is referenced through pointer variables. Although the pointer variables are scoped depending on their declaration, the dynamic memory itself has no intrinsic scope or lifetime. It can be accessed from any routine or thread that is given its address and will exist until explicitly made free. In a language supporting automatic garbage collection, it will exist until the run-time system detects that there are no references to it. (If your language supports garbage collection, be sure the garbage collector is thread-safe.)

The scope of dynamic memory is anywhere a pointer containing the address can be referenced. The lifetime is from allocation to deallocation.

Typically dynamic memory is appropriate to manage persistent context. For example, in a reentrant routine that is called multiple times to return a stream of information (such as to list all active connections to a server or to return a list of users), using dynamic memory allows the program to create multiple contexts that are independent of all the program's threads. Thus, multiple threads could share a given context, or a single thread could have more than one context.

3.4 Managing a Thread's Stack

For each thread created by your program, the Threads Library sets a default stack size that is acceptable to most applications. You can also set the stacksize attribute in a thread attributes object, to specify the stack size needed by the next thread created.

This section discusses the cases in which the stack size is insufficient (resulting in stack overflow) and how to determine the optimal size of the stack.

Most compilers on Compaq VAX based systems do not probe the stack. This makes stack overflow failure modes unpredictable and difficult to analyze. Be especially careful to use as little stack memory as practical.

Most compilers on Compaq Alpha based systems generate code in the procedure prologue that probes the stack, which detects if there is not enough space for the procedure to run.

3.4.1 Sizing the Stack

To determine the required size of a thread's stack, add the sizes of the frames, including local variables, for the deepest call tree. Add to that number an extra amount of memory to accommodate interrupts and context switching. Determining this figure is difficult because stack frames vary in size and because it might not be possible to estimate the depth of library routine call frames.

Compaq's Visual Threads includes a number of tools and procedures to measure and monitor stack use. See the Visual Threads product's online help for more information.

You can also run your program using a profiling tool that measures actual stack use. This is commonly done by "poisoning" the stack before it is used by writing a distinctive pattern, and then checking for that pattern after the thread completes. Remember: Use of profiling or monitoring tools typically increases the amount of stack memory that your program uses.

3.4.2 Using Stack Overflow Warning and Stack Guard Areas

By default, at the overflow end of each thread's stack, the Threads Library allocates an overflow warning area followed by a guard area. These two areas can help a multithreaded program detect overflow of a thread's stack.

Tru64 UNIX 5.0 and OpenVMS Alpha 7.3 include overflow warning support to allow the reporting of stack overflows while a thread can still be assured of executing code. The warning area is a page (or more) that is initially protected to trap writes, but then becomes writable so that it can be used to allow reporting or recovering from the overflow. (On Tru64 UNIX, the warning area is again protected once an overflow has been handled; on OpenVMS it remains unprotected.)

A guard area is a region of no access memory. When the thread attempts to access a memory location within this region, a memory addressing violation occurs. For a thread that allocates large data structures on the stack, create that thread using a thread attributes object in which a large guardsize attribute value has been set. A large stack guard region can help to prevent one thread from overflowing into another thread's stack region.

The pages of memory that form a stack guard region are also known as guard pages or "red zone"; the overflow warning area is also known as a "yellow zone".

Contents

Index

HP OpenVMS Systems Documentation

Guide to the POSIX Threads Library

2.5 Process-Shared Synchronization Objects

2.5.1 Programming Considerations

Chapter 3Programming with Threads

Chapter 3
Programming with Threads