OpenVMS Alpha Partitioning and Galaxy Guide
2.12 Security Considerations in an OpenVMS Galaxy Computing Environment
OpenVMS Galaxy instances executing in a shared-everything cluster
environment, in which all security database files are shared among all
instances, automatically provide a consistent view of all
Galaxy-related security profiles.
If you choose not to share all security database files throughout all
Galaxy instances, a consistent security profile can only be achieved
manually. Changes to an object's security profile must be followed by
similar changes on all instances where this object can be accessed.
Because of the need to propagate changes manually, it is unlikely that
such a configuration would ever be covered by a US C2 evaluation or by
similar evaluations from other authorities. Organizations that require
operating systems to have security evaluations should ensure that all
instances in a single OpenVMS Galaxy belong to the same cluster.
2.13 Configuring OpenVMS Galaxy Instances in Time Zones
OpenVMS Galaxy instances do not have to be in the same time zone unless
they are in the same cluster. For example, each instance in a
three-instance Galaxy configuration could be in a different time zone.
2.14 Developing OpenVMS Galaxy Programs
The following sections describes OpenVMS programming interfaces that
are useful in developing OpenVMS Galaxy application programs. Many of
the concepts are extensions of the traditional single-instance OpenVMS
system.
To see the C function prototypes for the services described in these
chapters, enter the following command:
$ library/extract=starlet sys$library:sys$starlet_c.tlb/output=filename
|
Then search the output file for the service you want to see.
2.14.1 Locking Programming Interfaces
One of the major features of the Galaxy platform is the ability to
share resources across multiple instances of the operating system. As
with any shared resource, the need arises to synchronize access to that
resource. The services described in this chapter provide primitives
upon which a cooperative scheme can be created to synchronize access to
shared resources within a Galaxy.
A Galaxy lock is a combination of a spinlock and a
mutex. While attempting to acquire an owned galaxy lock, the thread
will spin for a short period. If the lock does not become available
during the spin, the thread will put itself into a wait state. This is
different from SMP spinlocks in which the system crashes if the spin
times out, behavior that is not acceptable in a Galaxy.
Given the nature of Galaxy locks, they will reside somewhere in shared
memory. That shared memory can be allocated either by the user or by
the galaxy locking services. If the user allocates the memory, the
locking services track only the location of the locks. If the locking
services allocate the memory, it is managed on behalf of the user.
Unlike other monitoring code which is only part of the MON version of
execlets, the Galaxy lock monitoring code is always loaded.
There are several routines provided to manipulate Galaxy locks. The
routines do not provide anything but the basics when it comes to
locking. They are a little richer than the spinlocks used to support
SMP, but far less than what the lock manager provides. Table 2-1
summarizes the OpenVMS system services for lock programming.
Table 2-1 Galaxy System Services for Lock Programming
System Service |
Description |
$ACQUIRE_GALAXY_LOCK
|
Acquires ownership of an OpenVMS Galaxy lock.
|
$CREATE_GALAXY_LOCK
|
Allocates an OpenVMS Galaxy lock block from a lock table created with
the $CREATE_GALAXY_LOCK service.
|
$CREATE_GALAXY_LOCK_TABLE
|
Allocates an OpenVMS Galaxy lock table.
|
$DELETE_GALAXY_LOCK
|
Invalidates an OpenVMS Galaxy lock and deletes it.
|
$DELETE_GALAXY_LOCK_TABLE
|
Deletes an OpenVMS Galaxy lock table.
|
$GET_GALAXY_LOCK_INFO
|
Returns "interesting" fields from the specified lock.
|
$GET_GALAXY_LOCK_SIZE
|
Returns the minimum and maximum size of an OpenVMS Galaxy lock.
|
$RELEASE_GALAXY_LOCK
|
Releases ownership of an OpenVMS Galaxy lock.
|
2.14.2 System Events Programming Interfaces
Applications can register to be notified when certain system events
occur; for example, when an instance joins the Galaxy or if a CPU joins
a configure set. If events are registered, an application can decide
how to respond when the registered events occur.
Table 2-2 summarizes the OpenVMS system services available for
events programming.
Table 2-2 Galaxy System Services for Events Programming
System Service |
Description |
$CLEAR_SYSTEM_EVENT
|
Removes one or more notification requests previously established by a
call to $SET_SYSTEM_EVENT.
|
$SET_SYSTEM_EVENT
|
Establishes a request for notification when an OpenVMS system event
occurs.
|
2.14.3 Using SDA in an OpenVMS Galaxy
This section describes SDA information that is specific to an OpenVMS
Galaxy computing environment.
For more information about using SDA, refer to the OpenVMS Alpha System Analysis Tools Manual.
2.14.3.1 Dumping Shared Memory
When a system crash occurs in a Galaxy instance, the default behavior
of OpenVMS is to dump the contents of private memory of the failed
instance and the contents of shared memory. In a full dump, every page
of both shared and private memory is dumped; in a selective dump, only
those pages in use at the time of the system crash are dumped.
Dumping of shared memory can be disabled by setting bit 4 the dynamic
SYSGEN parameter DUMPSTYLE. This bit should only be set on the advice
of "your Compaq support," as the resulting system dump may
not contain the data required to determine the cause of the system
crash.
Table 2-3 shows the definitions of all the bits in DUMPSTYLE and
their meanings in OpenVMS Alpha. Bits can be combined in any
combination.
Table 2-3 Definitions of Bits in DUMPSTYLE
Bit |
Value |
Description |
0
|
1
|
0= Full dump. The entire contents of physical memory will be written to
the dump file.
1= Selective dump. The contents of memory will be written to the
dump file selectively to maximize the usefulness of the dump file while
conserving disk space. (Only pages that are in use are written).
|
1
|
2
|
0= Minimal console output. This consists of the bugcheck code; the
identity of the CPU, process, and image where the crash occurred; the
system date and time; plus a series of dots indicating progress writing
the dump.
1= Full console output. This includes the minimal output described
above plus stack and register contents, system layout, and additional
progress information such as the names of processes as they are dumped.
|
2
|
4
|
0= Dump to system disk. The dump will be written to
SYS$SYSDEVICE:[SYSn.SYSEXE]SYSDUMP.DMP, or in its absence,
SYS$SYSDEVICE:[SYSn.SYSEXE]PAGEFILE.SYS.
1= Dump to alternate disk. The dump will be written to
dump_dev:[SYSn.SYSEXE]SYSDUMP.DMP, where dump_dev is the value of the
console environment variable DUMP_DEV.
|
3
|
8
|
0= Uncompressed dump. Pages are written directly to the dump file.
1= Compressed dump. Each page is compressed before it is written,
providing a saving in space and in the time taken to write the dump, at
the expense of a slight increase in time taken to access the dump.
|
4
|
16
|
0= Dump shared memory.
1= Do not dump shared memory.
|
The default setting for DUMPSTYLE is 0 (an uncompressed full dump,
including shared memory, written to the system disk). Unless a value
for DUMPSTYLE is specified in MODPARAMS.DAT, AUTOGEN.COM will set
DUMPSTYLE to 1 (an uncompressed selective dump, including shared
memory, written to the system disk) if there is less than 128 megabytes
of memory on the system, or to 9 (a compressed selective dump,
including shared memory, written to the system disk) otherwise.
2.14.3.2 Summary of SDA Command Interface Changes or Additions
The following list summarizes how the System Dump Analyzer (SDA) has
been enhanced to view shared memory and OpenVMS Galaxy data structures.
For more details, see the appropriate commands.
- Added SHOW SHM_CPP. Default is a brief display of all SHM_CPPs.
- Added VALIDATE SHM_CPP. Default action is to validate all SHM_CPPs
and the counts and ranges of attached PFNs, but not the contents of the
database for each PFN.
- Added SHOW SHM_REG. Default is a brief display of all SHM_REGs.
- Added /GLXSYS and /GLXGRP to SHOW GSD.
- Added SHOW GMDB to display the contents of the GMDB and NODEB
blocks. Default is detailed display of GMDB.
- SHOW GALAXY shows a brief display of GMDB and all node blocks.
- SHOW GLOCK displays Galaxy lock structures. Default is display of
base GLOCK structures.
- SHOW GCT displays Galaxy configuration tree. Default is /SUMMARY.
- SHOW PAGE_TABLE and SHOW PROCESS/PAGE_TABLE.
Chapter 3 NUMA Implications on OpenVMS Applications
NUMA is an attribute of a system in which access time to any given
physical memory location is not the same for all CPUs. Given this
architecture, you must have consistently good location (but not
necessarily 100% of the time) for high performance. In the new
AlphaServer GS series, CPUs will access memory in their own QBB faster
than they will access memory in another QBB.
If Open VMS is running on the resources of a single QBB, then there is
no NUMA effect and this discussion does not apply. Whenever possible
and practical, you can benefit by running in a single QBB, thereby
eliminating the complexities NUMA may present.
The most common question for overall system performance in a NUMA
environment is, "uniform for all?" or "optimal for a
few?" In other words, do you want all processes to have roughly
equivalent performance, or do you want to focus on some specific
processes and make them as efficient as possible? Whenever a single
instance of OpenVMS runs on multiple QBBs (whether it is the entire
machine, a hard partition, or a Galaxy instance), then you must answer
this question, because the answer dictates a number of configuration
and management decisions you need to understand.
The OpenVMS default NUMA mode of operation is "uniform for all".
Resources are assigned so that over time each process on the system
has, on average, roughly the same performance potential.
If "uniform for all" is not what you want, you must understand the
interfaces available to you in order to achieve the more specialized
"optimal for a few" or "dedicated" environment. Processes and data can
be assigned to specific resources to give them the highest performance
potential possible.
To further enhance your understanding of the NUMA environment, this
chapter discusses the following:
- Base operating system NUMA actions
- Application resource considerations
- APIs
3.1 OpenVMS NUMA Awareness
OpenVMS memory management and process scheduling have been enhanced to
work more efficiently on the new AlphaServer GS Series systems hardware.
The operating system treats the hardware as a set of Resource Affinity
Domains (RADs). A RAD is the software grouping of physical resources
(CPUs, memory, and I/O) with common access characteristics. On the new
AlphaServer GS Series systems, a RAD corresponds to a Quad Building
Block (QBB). When a single instance of OpenVMS runs on multiple QBBs, a
QBB is seen as a RAD by OpenVMS.
Each of the following areas of enhancement adds a new capability to the
system. Individually each brings increased performance potential for
certain application needs. Collectively they provide the environment
necessary for a diverse application mix. The areas being addressed are:
- Assignment of process private pages
- Assignment of reserved memory pages
- Process scheduling
- Replication of read-only system space pages
- Allocation of nonpaged pool
- Tools for observing page assignment
A CPU references memory in the same RAD three times faster than it
references memory in another RAD. Therefore, it is important to keep
the code being executed and the memory being referenced in the same RAD
as much as possible. Consistently good location is the key to good
performance. In assessing performance the following questions
illustrate the types of things a programmer needs to consider.
- Where is the code you are executing?
- Where is the data you are accessing?
- Where is the I/O device you are using?
The OpenVMS scheduler and the memory management subsystem work together
to achieve the best possible location by:
- Assigning each process a preferred or "home" RAD.
- Usually scheduling a process on a CPU in its home RAD.
- Replicating operating system read-only code and some data in each
RAD.
- Distributing global pages over RADs.
- Striping reserved memory over RADs.
3.1.1 Home RAD
The OpenVMS operating system assigns a home RAD to each process during
process creation. This has two major implications. First, with rare
exception, one of the CPUs in the process's home RAD will run the
process. Second, all process private pages required by the process will
come from memory in the home RAD. This combination aids in maximizing
local memory references.
When assigning home RADs, the default action of OpenVMS is to
distribute the processes over the RADs.
3.1.2 System Code Replication
During system startup the operating system code is replicated in the
memory of each RAD so that each process in the system will be accessing
local memory whenever it requires system functions. This replication is
of both the executive code and the installed resident image code
granularity hint regions.
3.1.3 Distributing Global Pages
The default action of OpenVMS is to distribute global pages (the pages
of a global section) over the RADs. This approach is also taken with
the assignment of global pages that have been declared as reserved
memory during system startup.
3.2 Application Resource Considerations
Each application environment is different. An application's structure
may dictate which options are best for achieving the desired goals.
Some of the deciding factors include:
- Number of processes
- Amount of memory needed
- Amount of sharing between processes
- Use of certain base operating system features
- Use of locks and their location
There are few absolute rules, but the following sections present some
basic concepts and examples that will usually lead to the best outcome.
Localizing (on-QBB) memory access is always the goal, but it is not
always achievable and that is where tradeoffs are most likely to be
made.
3.2.1 Processes and Shared Data
If you have hundreds, or maybe thousands, of processes that access a
single global section, then you most likely want the default behavior
of the operating system. The pages of the global section will be
equally distributed in the memory of all RADs, and the processes' home
RAD assignments will be equally distributed over the CPUs. This is the
distributed, or "uniform", effect where over time all processes have
similar performance potential given random accesses to the global
section. None will be optimal but none will be at a severe disadvantage
compared to the others.
On the other hand, a small number of processes accessing a global
section can be "located" in a single RAD as long as 4 CPUs can handle
the processing load and a single RAD contains sufficient memory for the
entire global section. This will localize most memory access and
therefore enhance performance of those specifically located processes.
This strategy can be employed multiple times on the same system by
locating one set of processes and their data in one RAD and a second
set of processes and their data in another RAD.
3.2.2 Memory
A single QBB can have up to 32 GB of memory; two can have up to 64 GB,
and so on. Take advantage of the large memory capacity whenever
possible. For example, consider duplicating code or data in multiple
RADs. It will take some analysis, may seem wasteful of space, and will
require coordination. However, it may be worthwhile if it ultimately
makes significantly more memory references local.
Consider the use of a RAM disk product. Even if NUMA is involved,
in-memory references will outperform real device I/O.
3.2.3 Sharing and Synchronization
Sharing data usually requires synchronization. If the coordination
mechanism is a single memory location (sometimes called a latch, a
lock, or a semaphore), then it may be the cause of many remote accesses
and therefore degrade performance if the contention is high enough.
Multiple levels of such locks distributed throughout the data may
reduce the amount of remote access.
3.2.4 Use of OpenVMS Features
Heavy use of certain base operating system features will result in much
remote access because the data to support these functions resides in
the memory of QBB0. Some data cannot be duplicated and some can be but
has not been yet.
3.3 RAD Application Programming Interfaces
A number of interfaces specific to RADs are available to application
programmers and system managers for controlling the location of
processes and memory if the system defaults do not meet the needs of
the operating environment. The following subsections are brief
descriptions; the details can be found in the appropriate OpenVMS
System Services Reference Manual.
3.3.1 Creating a Process
If you want a process to have a specific home RAD, then use the new
HOME_RAD argument in the SYS$CREPRC system service. This allows the
application to control the location.
3.3.2 Moving a Process
If a process has already been created and you want to relocate it, use
the HOME_RAD argument to the SYS$SET_PROCESS_PROPERTIES system service.
The process's working set will be purged and, as it runs on the CPUs in
its new home RAD, its private pages will be reassigned from memory in
the new home RAD.
3.3.3 Getting Information About a Process
The SYS$GETJPI system service returns the home RAD of a process.
3.3.4 Creating a Global Section
The SYS$CRMPSC_GDZRO_64 and SYS$CREATE_GDZRO system services accept a
RAD argument mask. This indicates in which RADs OpenVMS should attempt
to assign the pages of the global section.
3.3.5 Assigning Reserved Memory
The SYSMAN interface for assigning reserved memory has a RAD qualifier,
so a system manager can declare that the memory being reserved should
come from specific RADs.
3.3.6 Getting Information About the System
The SYS$GETSYI system service defines the following item codes for
obtaining RAD information.
- RAD_MAX_RADS shows the maximum number of RADs possible on a
platform.
- RAD_CPUS shows a longword array of RAD/CPU pairs.
- RAD_MEMSIZE shows a longword array of RAD/page_count pairs.
- RAD_SHMEMSIZE shows a longword array of RAD/page_count pairs.
3.3.7 RAD_SUPPORT System Parameter
The RAD_SUPPORT system parameter has numerous bits and fields defined
for customizing individual RAD-related actions.
3.4 RAD System Services Summary Table
The following table describes RAD system service information for
OpenVMS Version 7.3.
For additional information, refer to the OpenVMS System Services Reference Manual.
System Service |
RAD Information |
$CREATE_GDZRO
|
Argument:
rad_mask
Flag:
SEC$M_RAD_HINT
Error status:
SS$_BADRAD
|
$CREPRC
|
Argument:
home_rad
Status flag bit:
stsflg
Symbolic name:
PRC$M_HOME_RAD
Error status:
SS$_BADRAD
|
$CRMPSC_GDZRO_64
|
Argument:
rad_mask
Flag:
SEC$M_RAD_MASK
Error status:
SS$_BADRAD
|
$GETJPI
|
Item code:
JPI$_HOME_RAD
|
$GETSYI
|
Item codes:
RAD_MAX_RADS,
RAD_CPUS,
RAD_MEMSIZE,
RAD_SHMEMSIZE,
GALAXY_SHMEMSIZE
|
$SET_PROCESS_PROPERTIESW
|
Item code:
PPROP$C_HOME_RAD
|
3.5 RAD DCL Command Summary Table
The following table summarizes OpenVMS RAD DCL commands. For additional
information, refer to the OpenVMS DCL Dictionary.
DCL Command/Lexical |
RAD Information |
SET PROCESS
|
Qualifier:
/RAD=HOME=
n
|
SHOW PROCESS
|
Qualifier:
/RAD
|
F$GETJPI
|
Item code:
HOME_RAD
|
F$GETSYI
|
Item codes:
RAD_MAX_RADS,
RAD_CPUS,
RAD_MEMSIZE,
RAD_SHMEMSIZE
|
|