![]() |
![]() HP OpenVMS Systemsask the wizard |
![]() |
The Question is: We are having problems two very large c programs and getting them to talk to each other. The two processes read to and write from an area of shared memory. The processes need to synchronise before a write or read and do this by writing a request into a separate area of the same shared page. It then sets a waiting flag to true which is also another area of the same shared page. It then uses the sys$hiber() call to wait for the other process to synchronise. If the waiting flag for the other process is set these steps will be ignored. The other process checks for any requests in the communication area of shared memory every cycle. If a request is there it w ill run until the time specified in the request is reached, then it will wake the other process using the sys$wake call. The lagging process which picks up the requests is crashing in random places and we have found that it is setting its waiting flag to true when the code to do this is not even executed. A stack overflow problem is therefore very likely. However we have checked the synchronisation is working properly, There are no shared event flags. We have tried increasing the stack size at link time and increasing the quotas. Since the shared memory area is in p0 space, is it possible that with such a big program that the shared memory page and the program are overlapping? The Answer is : Writing information into shared memory requires correct application synchronization, as discussed in topics 1661 and 2681 in detail. Topic 1661 also details a variety of common programming bugs. sys$hiber is not centrally intended as a synchronization tool, and applications using it are required to manage any spurious wakeup requests that can and do arise. (As referenced in topic 1661.) A periodic call to sys$wake (or sys$schdwk) is often useful in an application using sys$hiber -- deliberately creating a spurious wakeup -- as it can help drain pending activity if a sys$wake call is missed. A simple example of using shared memory from C is posted at the OpenVMS Ask The Wizard area. Care must be taken when operating in a multiprocessing environment, as the behaviour of the processor memory caches must be considered when accessing memory. On OpenVMS Alpha, this includes the use of memory barriers and interlocked operations as required -- see topic 2681 for details. Access to shared memory should generally be consolidated into as few routines as possible, and then integrated into a shareable image or similar -- this approach permits easier debugging and better control over the shared memory accesses. (Details of the creation and use of shareable images are available at the Ask The Wizard website.) This approach also eases the introduction of logging and debugging support into the application environment, as well as providing a way to add a condition handler for errors related to the shared memory, as well as easing the application-specific upgrade path(s) available to the programmer(s). As for your question on memory overlaps -- and since rogue pointers are certainly one possible cause -- yes. That said, as pages of memory containing executable code and constants are protected (by default) against any write access, this is unlikely. The most common causes of these problems tend to involve memory pool (heap) corruptions, writing too much to a variable on the stack (thus corrupting the stack), and writing to a variable that is no longer in an active stack frame. See 1661 for a rather more lengthy discussion of potential problems.
|