HP OpenVMS Systems Documentation

OpenVMS Alpha Partitioning and Galaxy Guide

Chapter 16
Shared Memory Programming Interfaces

A shared memory global section maps some amount of memory that can be accessed on all instances in a sharing community. These objects are also called Galaxywide shared sections. Each such object has a name, a version, and protection characteristics.

16.1 Using Shared Memory

Application programs access shared memory by mapping Galaxywide shared sections. The programming model is the same as for standard OpenVMS global sections; that is, you create, map, unmap, and delete them on each instance where you want to use them. Some shared memory global section characteristics are:

Pages start out as demand zero with preallocated shared PFNs.
Pages are not counted against your working set.
Once the page is valid in your process' page table, it stays valid until it is deleted; shared memory section pages are never paged to disk.
You must create the shared section on each instance where you want to access shared memory.
Sections can be temporary or permanent.
Sections can be group or system global sections.
Galaxywide shared sections use a different name space than traditional global sections.
Section versions specified in the ident_64 field are validated throughout the Galaxy.
Only one shared section with a given name and UIC group can exist in a sharing community. This is different from traditional global sections, in which multiple versions can coexist.
The SHMEM privilege is required to create a shared memory section.

From a programmer's point of view, shared memory global sections are similar to memory resident sections. You use the same system services to create Galaxywide shared sections that you would use to create memory resident sections. Setting the flag SEC$M_SHMGS lets the service operate on a shared memory global section.

In contrast to memory resident sections, the Reserved Memory Registry is not used to allocate space for Galaxywide sections. The SYSMAN RESERVE commands affect only node-private memory. Shared memory is not used for normal OpenVMS paging operations and does not need to be reserved.

There is also no user interface to specify whether shared page tables should be created for Galaxywide sections. Instead, creation of shared page tables for Galaxywide sections is tied to the section size. As of OpenVMS Version 7.2, shared page tables are created for sections of 128 pages (1 MB) or more. Galaxywide shared page tables are shared between all Galaxy instances.

16.2 System Services

The following sections describe new and changed system services that support shared memory global sections.

16.2.1 Enhanced Services

The following system services have been enhanced to recognize the new shared memory global section flag SEC$M_SHMGS:

SYS$CRMPSC_GDZRO_64
SYS$CREATE_GDZRO
SYS$MGBLSC_64
SYS$DGBLSC

The following system services have been enhanced to work with shared memory, but no interfaces were changed:

SYS$DELTVA
SYS$DELTVA_64
SYS$CREATE_BUFOBJ
SYS$CREATE_BUFOBJ_64
SYS$DELETE_BUFOBJ

16.2.2 New Section Flag SEC$M_READ_ONLY_SHPT

The new section flag SEC$M_READ_ONLY_SHPT is recognized by the SYS$CREATE_GDZRO and SYS$CRMPSC_GDZRO_64 services. When this bit is set, it directs the system to create shared page tables for the sections that allow read access only. This feature is particularly useful in an environment where a memory resident or Galaxy shared section is used by many readers but only a single writer.

When you map a Galaxy shared section or a memory resident section that has an associated shared page table section, you have the following options for accessing data:

Shared Page Tables	Read Only	Read and Write
None created	Do not set the SEC$M_WRT flag in the map request. Private page tables will always be used, even if you are specifying a shared page table region into which to map the section.	Set the SEC$M_WRT flag in the map request. Private page tables will always be used, even if you are specifying a shared page table region into which to map the section.
Write access	Do not set the SEC$M_WRT flag in the map request. Ensure that private page tables will be used. Do not specify a shared page table region into which to map the section. If you do, the error status SS$_IVSECFLG is returned.	Set the SEC$M_WRT flag in the map request. The shared page table section will be used for mapping if you specify a shared page table region into which to map the section.
Read access	Do not set the SEC$M_WRT flag in the map request. The shared page table section will be used for mapping if you specify a shared page table region into which to map the section.	Set the SEC$M_WRT flag in the map request. Ensure that private page tables will be used. Do not specify a shared page table region into which to map the section. If you do, the error status SS$_IVSECFLG is returned.

Notes

Shared page tables for Galaxy shared sections are also implemented as Galaxy shared sections. This implies that they allow either read access only on all OpenVMS instances connected to this section or read and write access on all instances. The setting of the SEC$M_READ_ONLY_SHPT flag as requested by the first instance to create the section is used on all instances.

Using the SYS$CRMPSC_GDZRO_64 service always implies that the SEC$M_WRT flag is set and that you want to map the section for writing. If you want to use this service to create a section with shared page tables for read-only access, you must use private page tables and you cannot specify a shared page table region into which to map the section.

16.3 Galaxywide Global Sections

The SHMEM privilege is required to create an object in Galaxy shared memory. The right to map to an existing section is controlled through normal access control mechanisms. SHMEM is not needed to map an existing section. Note that the VMS$MEM_RESIDENT_USER identifier, which is needed to create an ordinary memory resident section, is not required for Galaxywide sections.

Creating and mapping Galaxywide memory sections is accomplished through the same services used to create memory resident sections. The following services now recognize the SEC$M_SHMGS flag:

SYS$CREATE_GDZRO
SYS$CRMPSC_GDZRO_64
SYS$MGBLSC_64
SYS$DGBLSC

SYS$CREATE_GDZRO and SYS$CRMPSC_GDZRO_64 can also return new status codes.

SS$_INV_SHMEM	Shared memory is not valid.
SS$_INSFRPGS	Insufficient free shared pages or private pages.
SS$_NOBREAK	A Galaxy lock is held by another node and was not broken.
SS$_LOCK_TIMEOUT	A Galaxy lock timed out.

The INSTALL LIST/GLOBAL and SHOW MEMORY commands are also aware of Galaxywide sections.

Galaxywide sections are using their own name space. Just as you could always use the same name to identify system global sections and group global sections for various owner UICs, you can now also have Galaxywide system global sections and Galaxywide group global sections all with the same name.

Galaxywide sections also have their own security classes:

GLXSYS_GLOBAL_SECTION
GLXGRP_GLOBAL_SECTION

These security classes are used with the $GET_SECURITY and $SET_SECURITY system services, and DCL commands SET/SHOW SECURITY.

These new security classes are only valid in a Galaxy environment. They are not recognized on a non-Galaxy node.

You can only retrieve and affect security attributes of Galaxywide global sections if they exist on your sharing instance.

Audit messages for Galaxywide sections look like this:

%%%%%%%%%  OPCOM  20-MAR-1998 10:44:43.71  %%%%%%%% (from node GLX1 at 20-MAR-1998 10:44:43.85)
Message from user AUDIT$SERVER on GLX1
Security alarm (SECURITY) on GLX1, system id: 19955
Auditable event:          Object creation
Event information:        global section map request
Event time:               20-MAR-1998 10:44:43.84
PID:                      2040011A
Process name:             ANDY
Username:                 ANDY
Process owner:            [ANDY]
Terminal name:            RTA1:
Image name:               MILKY$DKA100:[ANDY]SHM_MAP.EXE;1
Object class name:        GLXGRP_GLOBAL_SECTION
Object name:              [47]WAY____D99DDB03_0$MY_SECTION
Secondary object name:    <Galaxywide global section>
Access requested:         READ,WRITE
Deaccess key:             8450C610
Status:                   %SYSTEM-S-CREATED, file or section did not exist; has
been created

Note the "Object name" field: the object name displayed here uniquely identifies the section in the OpenVMS Galaxy. The fields are as follows:

[47]	(only for group global sections) identifies the UIC group of the section creator.
WAY____D99DDB03_0$	An identifier for the sharing community.
MY_SECTION	The name of the section as specified by the user.

The user can only specify the section name and class for requests to set or show the security profile. The UIC is always obtained from the current process and the community identifier is obtained from the community in which the process executes.

The output for a Galaxywide system global section differs only in the fields "Object class name" and "Objects name." The object name for this type of section does not include a group identification field:

Object class name:	GLXSYS_GLOBAL_SECTION
Object name:	WAY____D99DDB03_0$SYSTEM_SECTION

Important Security Notes

Security attributes for a Galaxywide memory section must appear identical to a process no matter on what instance it is executing.

This can be achieved by having all instances participating in this sharing community also participate in a "homogeneous" OpenVMS Cluster, where all nodes share the security-related files:

SYSUAF.DAT, SYSUAFALT.DAT (system authorization file)
RIGHTSLIST.DAT (rights database)
VMS$OBJECTS.DAT (objects database)

In particular, automatic propagation of protection changes to a Galaxywide section requires that the same physical file (VMS$OBJECTS.DAT) is used by all sharing instances.

If your installation does not share these files throughout the Galaxy, the creator of a Galaxywide shared section must ensure that the section has the same security attributes on each instances. This may require manual intervention.

Chapter 17
OpenVMS Galaxy Device Drivers

This chapter describes OpenVMS Alpha Version 7.3 direct-mapped DMA window information for PCI drivers.

17.1 Direct-Mapped DMA Window Changes

The changes described in this chapter were made in OpenVMS Version 7.2 to support OpenVMS Galaxy and memory holes. The change involves moving the direct-mapped DMA window away from physical memory location 0. This chapter should provide enough background and information for you to update your driver if you have not yet updated it to OpenVMS Version 7.2 or later.

Note that this chapter does not cover bus-addressable pool (BAP).

17.2 How PCI Direct-Mapped DMA Works Prior to OpenVMS Version 7.2

On all PCI-based machines, the direct-mapped DMA window begins at (usually) 1 Gb in PCI space and covers physical memory beginning at 0 for 1 Gb as shown in Figure 17-1.

Figure 17-1 PCI-Based DMA

Typically drivers compare their buffer addresses against the length of the window returned by calling IOC$NODE_DATA with the IOC$K_DIRECT_DMA_SIZE function code. This assumes that the window on the memory side starts at zero. Another popular method for determining whether map registers are necessary involves looking at MMG$GL_MAXPFN. This is also not likely to work correctly in OpenVMS Version 7.3.

For a much better picture and explanation, see the Writing OpenVMS Device Alpha Drivers in C book.

17.3 How PCI Direct-Mapped DMA Works in Current Versions of OpenVMS

Galaxy and memory-hole considerations force OpenVMS to change the placement of the direct-mapped DMA window, as shown in Figure 17-2.

Figure 17-2 OpenVMS DMA

It is unknown from the drivers perspective where in memory the base of the direct-mapped DMA window will be. Simply comparing a buffer address against the length of the window will no longer be sufficient to determine whether a buffer is within the direct-mapped DMA window. Also, comparing against MMG$GL_MAXPFN will no longer guarantee that all of pool is within the window. The correct cell to check is MMG$GL_MAX_NODE_PFN. additionally, alignment concerns may require that a slightly different offset be incorporated into physical bus address calculations.

17.4 IOC$NODE_DATA Changes to Support Nonzero Direct-Mapped DMA Windows

To alleviate this problem, new function codes have been added to IOC$NODE_DATA. Here is a list of all the codes relating to direct-mapped DMA, and a description of what the data means.

IOC$K_DIRECT_DMA_BASE	This is the base address on the PCI side, or bus address. There is a synonym for this function code called IOC$K_DDMA_BASE_BA. A 32-bit result will be returned.
IOC$DIRECT_DMA_SIZE	On non-Galaxy machines, this returns the size of the direct-mapped DMA window (in megabytes). On a system where the direct-mapped DMA window does not start at zero, the data returned is zero, implying that no direct-mapped DMA windows exist. A 32-bit result will be returned.
IOC$K_DDMA_WIN_SIZE	On all systems, this will always return the size of the direct-mapped DMA window (in megabytes). A 32-bit result will be returned.
IOC$K_DIRECT_DMA_BASE_PA	This is the base physical address in memory of the direct-mapped DMA window. A 32-bit result will be returned.

The address returned with the IOC$K_DIRECT_DMA_BASE_PA code is necessary to compute the offset. (This usually used to be the 1 Gb difference between the memory PA and the bus address.) The offset is defined as the signed difference between the base bus address and the base memory address. This is now not necessarily 1 Gb.

Appendix A
OpenVMS Galaxy CPU Load Balancer Program

This appendix contains an example program of a privileged-code application that dynamically reassigns CPU resources among instances in an OpenVMS Galaxy.

A.1 CPU Load Balancer Overview

The OpenVMS Galaxy CPU Load Balancer program is a privileged application that dynamically reassigns CPU resources among instances in an OpenVMS Galaxy.

The program must be run on each participating instance. Each image will create, or map to, a small shared-memory section and periodically post information regarding the depth of that instance's COM queues. Based upon running averages of this data, each instance will determine the most and the least busy instances. If these factors exist for a specified duration, the least busy instance having available secondary processors will reassign one of its processors to the most busy instance, thereby effectively balancing processor usage across the OpenVMS Galaxy. The program provides command-line arguments to allow tuning of the load-balancing algorithm. The program is admittedly shy on error handling.

This program uses the following OpenVMS Galaxy system services:

SYS$CPU_TRANSITION	CPU reassignment
SYS$CRMPSC_GDZRO_64	Shared memory creation
SYS$SET_SYSTEM_EVENT	OpenVMS Galaxy event notification
SYS$_GALAXY_LOCK_	OpenVMS Galaxy locking

Because OpenVMS Galaxy resources are always reassigned via a push model, where only the owner instance can release its resources, one copy of this process must run on each instance in the OpenVMS Galaxy.

This program can be run only in an OpenVMS Version 7.2 or later multiple-instance Galaxy.

A.1.1 Required Privileges

The CMKRNL privilege is required to count CPU queues. The SHMEM privilege is required to map shared memory.

A.1.2 Build and Copy Instructions

Compile and link the example program as described below, or copy the precompiled image found in SYS$EXAMPLES:GCU$BALANCER.EXE to SYS$COMMON:[SYSEXE]GCU$BALANCER.EXE.

If your OpenVMS Galaxy instances use individual system disks, you will need to perform this action for each instance.

If you change the example program, compile and link it as follows:

$ CC GCU$BALANCER.C+SYS$LIBRARY:SYS$LIB_C/LIBRARY
$ LINK/SYSEXE GCU$BALANCER

A.1.3 Startup Options

You must establish a DCL command for this program. We have provided a sample command table file for this purpose. To install the new command, do the following:

$ SET COMMAND/TABLE=SYS$LIBRARY:DCLTABLES -
_$ /OUT=SYS$COMMON:[SYSLIB]DCLTABLES GCU$BALANCER.CLD

This command inserts the new command definition into DCLTABLES.EXE in your common system directory. The new command tables will take effect when the system is rebooted. If you would like to avoid a reboot, do the following:

$ INSTALL REPLACE SYS$COMMON:[SYSLIB]DCLTABLES.EXE

After this command, you will need to log out, then log back in to use the command from any active processes. Alternatively, if you would like to avoid logging out, do the following from each process you would like to run the balancer from:

$ SET COMMAND GCU$BALANCER.CLD

Once your command has been established, you may use the various command line parameters to control the balancer algorithm.

$ CONFIGURE BALANCER[/STATISTICS] x y time

In this command, x is the number of load samples to take, y is the number of queued processes required to trigger resource reassignment, and time is the delta time between load sampling.

The /STATISTICS qualifier causes the program to display a continuous status line. This is useful for tuning the parameters. This output is not visible if the balancer is run detached, as is the case if it is invoked via the GCU. The /STATISTICS qualifier is intended to be used only when the balancer is invoked directly from DCL in a DECterm window. For example:

$ CONFIG BAL 3 1 00:00:05.00

Starts the balancer which samples the system load every 5 seconds. After three samples, if the instance has one or more processes in the COM queue, a resource (CPU) reassignment will occur, giving this instance another CPU.

A.1.4 Starting the Load Balancer from the GCU

The GCU provides a menu item for launching SYS$SYSTEM:GCU$BALANCER.EXE and a dialog for altering the balancer algorithm. These features will only work if the balancer image is properly installed as described the following paragraphs.

To use the GCU-resident balancer startup option, you must:

Compile, link, or copy the balancer image as described previously.
Invoke the GCU using the following command:
$ CONFIGURE GALAXY
You might need to set your DECwindows display to a suitably configured workstation or PC.
Choose the CPU Balancer item from the Galaxy menu.
Select appropriate values for your system. This can take some testing. By default, the values are set aggressively so that the balancer action can be readily observed. If your system is very heavily loaded, you will need to increase the values accordingly to avoid excessive resource reassignment. The GCU does not currently save these values, so you may want to write them down once you are satisfied.
Select the instances you want to have participate, then select the Start function, then click on OK. The GCU should launch the process GCU$BALANCER on all selected instances. You might want to verify that these processes have been started.

A.1.5 Shutdown Warning

In an OpenVMS Galaxy, no process may have shared memory mapped on an instance when it leaves the Galaxy---for example, during a shutdown. To stop the process if the GCU$BALANCER program is run from a SYSTEM UIC, you must modify SYS$MANAGER:SYSHUTDWN.COM. Processes in the SYSTEM UIC group are not terminated by SHUTDWN.COM when shutting down or rebooting OpenVMS. If a process still has shared memory mapped when an instance leaves the Galaxy, the instance will crash with a GLXSHUTSHMEM bugcheck.

To make this work, SYS$MANAGER:SYSHUTDWN.COM must stop the process as shown in the following example. Alternatively, the process can be run under a suitably privileged, non-SYSTEM UIC.

** SYSHUTDWN.COM EXAMPLE - Paste into SYS$MANAGER:SYSHUTDWN.COM
**
**    $!
**    $! If the GCU$BALANCER image is running, stop it to release shmem.
**    $!
**    $ procctx = f$context("process",ctx,"prcnam","GCU$BALANCER","eql")
**    $ procid  = f$pid(ctx)
**    $ if procid .NES. "" then $ stop/id='procid'

Note that you could also use a $ STOP GCU$BALANCER statement.

Contents

Index