HP OpenVMS Systems Documentation

OpenVMS Performance Management

Chapter 10
Compensating for Resource Limitations

This chapter describes corrective procedures for each of the various categories of resource limitations described in Chapter 5.

Wherever the corrective procedure suggests changing the value of one or more system parameters, the description explains briefly whether the parameter should be increased, decreased, or given a specific value. Relationships between parameters are identified and explained, if necessary. However, to avoid duplicating information available in the OpenVMS System Management Utilities Reference Manual: M--Z, complete explanations of parameters are not included.

You should review descriptions of system parameters, as necessary, before changing the parameters.

10.1 Changing System Parameters

Before you make any changes to your system parameters, make a copy of the existing version of the file that is in the SYSGEN work area, using a technique such as the following:

$ RUN SYS$SYSTEM:SYSGEN
SYSGEN> WRITE SYS$SYSTEM:file-spec
SYSGEN> EXIT

You may want to use a date as part of the file name you specify for file-spec to readily identify the file later.

By creating a copy of the current values, you can always return to those values at some later time. Generally you use the following technique, specifying your parameter file as file-spec:

$ RUN SYS$SYSTEM:SYSGEN
SYSGEN> USE SYS$SYSTEM:file-spec
SYSGEN> WRITE ACTIVE
SYSGEN> EXIT

However, if some of the parameters you changed were not dynamic, to restore them from the copied file, you must instead use the SYSGEN command WRITE CURRENT, and then reboot the system.

Caution

Do not directly modify system parameters using SYSGEN. AUTOGEN overrides system parameters set with SYSGEN, which can cause a setting to be lost months or years after it was made.

10.1.1 Guidelines

You should change only a few parameters at a time.

Whenever your changes are unsuccessful, make it a practice to restore the parameters to their previous values before you continue tuning. Otherwise, it can be difficult to determine which changes produce currently observed effects.

If you are planning to change a system parameter and you are uncertain of the ultimate target value or of the sensitivity of the specific parameter to changes, err on the conservative side in making initial changes. As a guideline, you might make a 10 percent change in the value first so that you can observe its effects on the system.

If...	Then ...
You see little or no effect	Try doubling or halving the original value of the parameter depending on whether you are increasing or decreasing it.
This magnitude of change had no effect	Restore the parameter to its original value with the parameter file you saved before starting.
You cannot affect your system performance with changes of this magnitude	You probably have not selected the right parameter for change.

10.1.2 Using AUTOGEN

In most cases, you will want to use AUTOGEN to change system parameters since AUTOGEN adjusts related parameters automatically. (For a discussion of AUTOGEN, see the OpenVMS System Manager's Manual, Volume 2: Tuning, Monitoring, and Complex Systems.) In the few instances where it is appropriate to change a parameter in the special parameter group, further explanation of the parameter is given in this chapter, since special parameters are otherwise undocumented.

10.1.3 When to Use SYSGEN

If your tuning changes involve system parameters that are dynamic, plan to test the changes on a temporary basis first. This is the only instance where the use of SYSGEN is warranted for making tuning changes.

Once you are satisfied that the changes are working well, you should invoke AUTOGEN with the REBOOT parameter to make the changes permanent.

10.2 Monitoring the Results

After you perform the recommended corrective actions in this and the following chapters, repeat the steps in the preceding chapters to observe the effects of the changes. As you repeat the steps, watch for new problems introduced by the corrective actions or previously undetected problems. Your goal should be to complete the steps in those chapters without uncovering a serious symptom or problem.

After you change system values or parameters, you must monitor the results, as described in Section 2.7.7. You have two purposes for monitoring:

You must ensure that the changes are not introducing new problems.
You must evaluate the degree of success achieved.

You may want to return to the appropriate procedures in Chapters 5, 7, 8, and 9 as you evaluate your success after tuning and decide whether to pursue additional tuning efforts. However, always keep in mind that there is a point of diminishing returns in every tuning effort (see Section 2.7.7.1).

Chapter 11
Compensating for Memory-Limited Behavior

This chapter describes corrective procedures for memory resource limitations described in Chapters 5 and 7.

11.1 Improving Memory Responsiveness

It is always good practice to check the four methods for improving memory responsiveness to see if there are ways to free up more memory, even if no problem seems to exist currently. The easiest way to improve memory utilization significantly is to make sure that active memory reclamation is enabled.

11.1.1 Equitable Memory Sharing

When active memory reclamation is enabled, the system distributes memory among active processes in an equitable and expeditious manner. If you feel page faulting is excessive with this policy enabled, make sure processes have not reached their WSEXTENT values. Note that precise WSQUOTA values are not very important when this policy is enabled, provided that GROWLIM and BORROWLIM are set equal to FREELIM using AUTOGEN.

If active memory reclamation is not enabled (that is, the value of MMG_CTLFLAGS is 0), then overall system page fault behavior is highly dependent on current process WSQUOTA values. The following discussion can help you to determine if inequitable memory sharing is occurring.

Because page fault behavior is so heavily dependent on the page referencing patterns of user programs, the WSQUOTA values you assign may be satisfactory for some programs but not for others. Use the ACCOUNTING image report described in Section 4.3 to identify the programs (images) that are the heaviest faulters on your system, and then compensate by encouraging users to run such images as batch jobs on queues you have set up with large WSQUOTA values.

Inequitable Sharing

You may be able to detect inequitable sharing by looking at the Faults column of the MONITOR PROCESSES display in a standard summary report (it is not contained in the multifile summary report). A process with a page fault accumulation much higher than that of other processes is suspect, although it depends on how long the process has been active.

A better means of detection is to use the MONITOR playback feature to view a display of the top page faulters during each collection interval:

$ MONITOR /INPUT=SYS$MONITOR:file-spec /VIEWING_TIME=1 PROCESSES /TOPFAULT

You may want to select a time interval using the /BEGINNING and /ENDING qualifiers when you suspect that a problem has occurred.

Check to see whether the top process changes periodically. If it appears that one or two processes are consistently the top faulters, you may want to obtain more information about which images they are running and consider upgrading their WSQUOTA values, using the guidelines in Section 3.5. Sometimes a small adjustment in a WSQUOTA value can make a drastic difference in the page faulting behavior, if the original value was near the knee of the working-set/page-fault curve (see Figures 3-3 and 3-4).

If you find that the MONITOR collection interval is too large to provide sufficient detail, try entering the previous command on the running system (live mode) during a representative period, using the default 3-second collection interval. If you discover an inequity, try to obtain more information about the process and the image being run by entering the SHOW PROCESS /CONTINUOUS command.

Another way to check for inequitable sharing of memory is to use the WORKSET.COM command procedure described in Section 7.1.3. Examine the various working set values and ensure that the allocation of memory, even if not evenly distributed, is appropriate.

11.1.2 Reduction of Memory Consumption by the System

The operating system uses physical memory for storage of the code and data structures it requires to support user processes. You have control over the sizes of two of the memory areas reserved for the system: the system working set and the nonpaged pool area. Both of these areas are sized by AUTOGEN. The sizes set by AUTOGEN are normally adequate but may not be optimal because AUTOGEN cannot anticipate all operational requirements.

11.1.2.1 System Working Set

The system working set is an area of physical memory reserved to satisfy page faults of virtual addresses in system space.

Such virtual addresses can be code or data (paged pool, for example). Because the same system working set is used for all processes on the system, there is very little locality associated with it.

Therefore, the system fault rate can be expected to change slowly in relation to changes in the system working set size (as controlled by the system parameter SYSMWCNT). A rule of thumb is to try to keep the system fault rate to less than 2 per second.

Keep in mind, however, that pages allocated to the system working set by raising the value of SYSMWCNT are considered permanently allocated to the system and are therefore no longer available for process working sets.

11.1.2.2 Nonpaged Pool

The nonpaged pool area is a portion of physical memory permanently allocated to the system for the storage of data structures and device drivers.

AUTOGEN determines the initial size of the nonpaged pool, but automatic expansion will occur if necessary. The system expands pool as required by permanently allocating a page of memory from the free-page list. Pages allocated in this manner are not available for use by process working sets until the system is rebooted.

11.1.2.3 Adaptive Pool Management

The high-performance nonpaged pool allocator reduces the probability of system outages due to exhaustion of memory allocated for system data structures (pool). Adaptive pool management virtually eliminates the need to actively manage the allocation of pool resources. The nonpaged pool area and lookaside lists are combined into one region (defined by the system parameters NPAGEDYN and NPAGEVIR), allowing memory packets to migrate from lookaside lists to general pool and back again based on demand. As a result, the system is capable of tuning itself according to the current demand for pool, optimizing its use of these resources, and reducing the risk of running out of these resources.

Caution

On OpenVMS Alpha systems, it is important to set NPAGEDYN sufficiently large for best performance. If the nonpaged area grows beyond NPAGEDYN, that area will not be included in the large data page granularity hint region (GHR). Applications will experience degraded performance when accessing the expanded nonpaged pool due to an increase in translation buffer (TB) misses.

Internal to the allocator is an array of lookaside lists that contiguously span an allocation range from 1 to 5120 bytes. These lookaside lists require no external tuning. They are automatically prepopulated during bootstrapping based on previous demand and each continuously adapts its number of packets based on the changing demand during the life of the system. The result is very high performance due to a very high hit percentage on the internal lookaside lists, typically over 99 percent.

Deallocating Nonpaged Pool

When dellocating nonpaged pool, the allocator requires that you pass an accurate packet size either in R1 or in the word starting at the eighth byte in the packet itself. The size of the packet determines to which internal lookaside list the packet will be deallocated.

Enabling and Disabling Pool Monitoring

The setting of the parameter POOLCHECK at boot time also controls which version of the pool allocator is loaded as follows:

If POOLCHECK equals a nonzero value, a monitoring version is loaded, which contains the corruption-detecting code and statistics maintenance.
The following System Dump Analyzer (SDA) commands are also enabled:
- SHOW POOL/STATISTICS---Displays the address of the listhead, the list packet size, and the number of attempts, failures, and deallocations made to that list since bootstrapping for each of the internal lookaside lists.
- SHOW POOL/RING_BUFFER---Displays in reverse chronological order information about the last 512 requests made to nonpaged pool. It is useful in analyzing potential corruption problems.
Refer to the OpenVMS Alpha System Dump Analyzer Utility Manual for more information about these commands.
If POOLCHECK equals zero, a minimal version is loaded containing no corruption-detecting or statistics maintenance code.

For more information about the POOLCHECK parameter, refer to the OpenVMS System Management Utilities Reference Manual.

Nonpaged Pool Granularity

The granularity of nonpaged pool changed with OpenVMS Version 6.0. Any code that either explicitly assumes the granularity of nonpaged pool to be 16 bytes or makes use of the symbol EXE$C_ALCGRNMSK to perform (for example) structure alignment must be changed to use the symbol EXE$M_NPAGGRNMSK, which reflects the nonpaged pool's current granularity.

11.1.2.4 Additional Consistency Checks

On Alpha, the system parameter SYSTEM_CHECK is used to investigate intermittent system failures by enabling a number of run-time consistency checks on system operation and recording some trace information.

Enabling SYSTEM_CHECK causes the system to behave as if the following system parameter values are set:

Parameter¹	Value	Description
BUGCHECKFATAL	1	Crashes the system on nonfatal bugchecks
POOLCHECK ²	%X616400FF	Enables all pool checking with an allocated pool pattern of %X61616161 ('aaaa') and a deallocated pool pattern of %X64646464 ('dddd')
MULTIPROCESSING	2	Enables full synchronization checking

¹Note that the values of the parameters are not actually changed.
²Setting POOLCHECK to a nonzero value overrides the settings imposed by SYSTEM_CHECK.

While SYSTEM_CHECK is enabled, the previous settings of the BUGCHECKFATAL and MULTIPROCESSING parameters are ignored.

Setting SYSTEM_CHECK causes certain image files to be loaded that are capable of the additional system monitoring. These image files are located in SYS$LOADABLE_IMAGES and can be identified by the suffix _MON.

Note that enabling SYSTEM_CHECK, or any of the individual system checks listed, may have an impact on system performance because the system must do extra work to perform these run-time consistency checks. Also note that you should use BUGCHECKFATAL with care in a multiuser environment because it causes the entire system to crash.

These checks can be very helpful when working with applications or layered products that are causing problems, especially in the way they interact with the system. However, once the system has achieved stability they should generally be turned off.

For more information about the interaction of the SYSTEM_CHECK system parameter with the ACP_DATACHECK system parameter, see the description of ACP_DATACHECK in the OpenVMS System Management Utilities Reference Manual.

11.1.3 Memory Offloading

While the most common and probably most cost-effective type of offloading is that performed by shifting the CPU and disk resources onto memory, it is possible to improve memory responsiveness by offloading it onto disk. This procedure is recommended only when sufficient disk resource is available and its use is more cost effective than purchasing additional memory.

Some of the CPU offloading techniques described in Section 13.1.3 apply also to memory. Additional techniques are as follows:

Install images with the appropriate attributes. When an image is accessed concurrently by more than one process on a routine basis, it should be installed /SHARED, so that all processes use the same physical copy of the image. The LIST/FULL command of the Install utility shows the highest number of concurrent accesses to an image installed with the /SHARED qualifier. This information can help you decide whether installing an image is worth the space.
Favor process swapping over working set trimming for process-intensive applications. There are cases where an image creates several subprocesses that might not be used continuously during the run time. These idle processes take up a share of physical memory, so it may be wise to swap them out. This typically occurs when users walk away from their terminals for long periods of time.
The following two techniques, used concurrently, will make the system favor swapping out inactive processes over trimming the working sets of highly active processes:
- On a per-process basis---Increase the working set quotas of the active processes, thus reducing reclamation from first-level trimming.
- On a systemwide basis---Increase the value of the system parameter SWPOUTPGCNT perhaps as high as a typical WSQUOTA. As a result, fewer pages will be trimmed, so it is more likely that swapping will occur.
After making adjustments, monitor the inswap rate closely. If it becomes excessive, lower the value of SWPOUTPGCNT.

Evaluating the Swapping File

When you increase swapping, it is important to evaluate the size of the swapping file. If the swapping file is not large enough, system performance will degrade. Use AUTOGEN feedback to size the swapping file appropriately.

11.1.4 Memory Load Balancing

You can balance the memory load by using some of the CPU load-balancing techniques for VMSclusters described in Section 13.1.5 to shift user demand.

To balance the load by reconfiguring memory hardware, perform the following steps:

Examine the multifile summary report.
Look at the Free List Size item of the PAGE class.

The Free List Size item gives the relative amounts of free memory available on each CPU. If a system seems to be deficient in memory and is experiencing memory management problems, perhaps the best solution is to reconfigure the VMScluster by moving some memory from a memory-rich system to a memory-poor one---provided the memory type is compatible with both CPU types.

Note

The Free List Size item is an average of levels, or snapshots. Because it is not a rate, its accuracy depends on the collection interval.

The following sections describe procedures to remedy specific conditions that you might have detected as the result of the investigation described in Chapter 7.

11.2 Reduce Number of Image Activations

There are several ways to reduce the number of image activations. You and the programming staff should explore them all and apply those you deem feasible and likely to produce the greatest results.

11.2.1 Programs Versus Command Procedures

Excessive image activations can result from running large command procedures frequently, because all DCL commands (except those performed within the command interpreter) require an image activation. If command procedures are introducing the problem, consider writing programs to replace them.

11.2.2 Code Sharing

When code is actively shared, the cost of image startups decreases. Perhaps your installation has failed to design applications that share code. You should examine ways to employ code sharing wherever suitable. See the appropriate sections in Section 1.4.3 and Section 3.8.

You will not see the number of image activations drop when you begin to use code sharing, but you should see an improvement in performance. The effect of code sharing is to shift the type of faults at image activation from hard faults to soft faults, a shift that results in performance improvement.

11.2.3 Designing Applications for Native Mode

Yet another source of excessive image activations is migration of programs from other operating systems without any design changes. For example, programs that employ the chaining technique on another operating system will not use memory efficiently on an OpenVMS system if you simply recompile them and ignore design differences. When converting applications to run on an OpenVMS system, always consider the benefits of designing and coding each application for native-mode operation.

Contents

Index