HP OpenVMS Systems Documentation

OpenVMS Performance Management

Order Number: AA--R237C--TE

April 2001

This manual is a conceptual guide for experienced users responsible for optimizing performance on OpenVMS systems. For information about OpenVMS performance on AlphaServer GS80/160/320 systems, see the OpenVMS on AlphaServer GS-Series Systems Configuration and Performance Guidelines, available at http://www.openvms.compaq.com/gsseries/index.html.

Revision/Update Information: This manual supersedes the OpenVMS Performance Management, OpenVMS Version 7.2.

Software Version: OpenVMS Version 7.3

Compaq Computer Corporation
Houston, Texas

Compaq, VAX, VMS, and the Compaq logo Registered in U.S. Patent and Trademark Office.

OpenVMS is a trademark of Compaq Information Technologies Group, L.P. in the United States and other countries.

All other product names mentioned herein may be the trademarks of their respective companies.

Confidential computer software. Valid license from Compaq required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license.

Compaq shall not be liable for technical or editorial errors or omissions contained herein.

The information in this document is provided "as is" without warranty of any kind and is subject to change without notice. The warranties for Compaq products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty.

The following are trademarks of Compaq Computer Corporation: Alpha, ACMS, DDIF, DECdirect, DECnet, HSC, and MicroVAX.

The following are third-party trademarks:

Motif and UNIX are trademarks of The Open Group in the United States and other countries.

ZK6491

The Compaq OpenVMS documentation set is available on CD-ROM.

This document was prepared using DECdocument, Version V3.3-1e.

Contents

Index

Preface

This manual presents techniques for evaluating, analyzing, and optimizing performance on a system running OpenVMS. Discussions address such wide-ranging concerns as:

Understanding the relationship between work load and system capacity
Learning to use performance-analysis tools
Responding to complaints about performance degradation
Helping the site adopt programming practices that result in the best system performance
Using the system features that distribute the work load for better resource utilization
Knowing when to apply software corrections to system behavior---tuning the system to allocate resources more effectively
Evaluating the effectiveness of a tuning operation; knowing how to recognize success and when to stop
Evaluating the need for hardware upgrades

The manual includes detailed procedures to help you evaluate resource utilization on your system and to diagnose and overcome performance problems resulting from memory limitations, I/O limitations, CPU limitations, human error, or combinations of these. The procedures feature sequential tests that use OpenVMS tools to generate performance data; the accompanying text explains how to evaluate it.

Whenever an investigation uncovers a situation that could benefit from adjusting system values, those adjustments are described in detail, and hints are provided to clarify the interrelationships of certain groups of values. When such adjustments are not the appropriate or available action, other options are defined and discussed.

Decision-tree diagrams summarize the step-by-step descriptions in the text. These diagrams should also serve as useful reference tools for subsequent investigations of system performance.

This manual does not describe methods for capacity planning, nor does it attempt to provide details about using OpenVMS RMS features (hereafter referred to as RMS). Refer to the Guide to OpenVMS File Applications for that information. Likewise, the manual does not discuss DECnet for OpenVMS performance issues, because the DECnet-Plus for OpenVMS Network Management manual provides that information.

Intended Audience

This manual addresses system managers and other experienced users responsible for maintaining a consistently high level of system performance, for diagnosing problems on a routine basis, and for taking appropriate remedial action.

Document Structure

This manual is divided into 13 chapters and 4 appendixes, each covering a related group of performance management topics as follows:

Chapter 1 provides a review of workload management concepts and describes guidelines for evaluating user complaints about system performance.
Chapter 2 lists postinstallation operations for enhancing performance and discusses performance investigation and tuning strategies.
Chapter 3 discusses OpenVMS memory management concepts.
Chapter 4 explains how to use utilities and tools to collect and analyze data on your system's hardware and software resources. Included are suggestions for reallocating certain resources should analysis indicate such a need.
Chapter 5 outlines procedures for investigating performance problems.
Chapter 6 describes how to evaluate system resource responsiveness.
Chapter 7 describes how to evaluate the performance of the memory resource and how to isolate specific memory resource limitations.
Chapter 8 describes how to evaluate the performance of the disk I/O resource and how to isolate specific disk I/O resource limitations.
Chapter 9 describes how to evaluate the performance of the CPU resource and how to isolate specific CPU resource limitations.
Chapter 10 provides general recommendations for improving performance with available resources.
Chapter 11 provides specific recommendations for improving the performance of the memory resource.
Chapter 12 provides specific recommendations for improving the performance of the disk I/O resource.
Chapter 13 provides specific recommendations for improving the performance of the CPU resource.
Appendix A lists the decision trees used in the various performance evaluations described in this manual.
Appendix B summarizes the MONITOR data items you will find useful in evaluating your system.
Appendix C provides an example of a MONITOR multifile summary report.
Appendix D provides ODS-1 performance information.

Reader's Comments

Compaq welcomes your comments on this manual. Please send comments to either of the following addresses:

Internet	openvmsdoc@compaq.com
Mail	Compaq Computer Corporation OSSG Documentation Group, ZKO3-4/U08 110 Spit Brook Rd. Nashua, NH 03062-2698

How To Order Additional Documentation

Use the following World Wide Web address to order additional documentation:

http://www.openvms.compaq.com/

If you need help deciding which documentation best meets your needs, call 800-282-6672.

Conventions

In this manual, every use of DECwindows and DECwindows Motif refers to Compaq DECwindows Motif for OpenVMS software.

The following conventions are also used in this manual:

. . .	A vertical ellipsis indicates the omission of items from a code example or command format; the items are omitted because they are not important to the topic being discussed.
( )	In command format descriptions, parentheses indicate that you must enclose choices in parentheses if you specify more than one.
[ ]	In command format descriptions, brackets indicate optional choices. You can choose one or more items or no items. Do not type the brackets on the command line. However, you must include the brackets in the syntax for OpenVMS directory specifications and for a substring specification in an assignment statement.
\|	In command format descriptions, vertical bars separate choices within brackets or braces. Within brackets, the choices are optional; within braces, at least one choice is required. Do not type the vertical bars on the command line.
bold text	This typeface represents the introduction of a new term. It also represents the name of an argument, an attribute, or a reason.
italic text	Italic text indicates important information, complete titles of manuals, or variables. Variables include information that varies in system output (Internal error number), in command lines (/PRODUCER= name), and in command parameters in text (where dd represents the predefined code for the device type).
UPPERCASE TEXT	Uppercase text indicates a command, the name of a routine, the name of a file, or the abbreviation for a system privilege.
`Monospace text`	Monospace type indicates code examples and interactive screen displays. In the C programming language, monospace type in text identifies the following elements: keywords, the names of independently compiled external functions and files, syntax summaries, and references to variables or identifiers introduced in an example.
-	A hyphen at the end of a command format description, command line, or code line indicates that the command or statement continues on the following line.
numbers	All numbers in text are assumed to be decimal unless otherwise noted. Nondecimal radixes---binary, octal, or hexadecimal---are explicitly indicated.

Chapter 1
Performance Management

Managing system performance involves being able to evaluate and coordinate system resources and workload demands.

A system resource is a hardware or software component or subsystem under the direct control of the operating system, which is responsible for data computation or storage. The following subsystems are system resources:

CPU
Memory
Disk I/O
Network I/O
LAN I/O
Internet I/O
Cluster communication other than LAN (CI, FDDI, MC)

In addition to this manual, specific cluster information can be found in the Guidelines for OpenVMS Cluster Configurations and the OpenVMS Cluster Systems.

Performance management means optimizing your hardware and software resources for the current work load. This involves performing the following tasks:

Acquiring a thorough knowledge of your work load and an understanding of how that work load exercises the system's resources
Monitoring system behavior on a routine basis in order to determine when and why a given resource is nearing capacity
Investigating reports of degraded performance from users
Planning for changes in the system work load or hardware configuration and being prepared to make any necessary adustments to system values
Performing certain optional system management operations after installation

To help you understand the scope and interrelationship of these issues, this chapter covers the following topics:

A review of workload management concepts
Guidelines for developing a performance management strategy

Because many different networking options are available, network I/O is not formally covered in this manual. General performance concepts discussed here apply to networking, and networking should be considered within the scope of analyzing any system performance problem. You should consult the documentation available for the specific products that you have installed for specific guidelines concerning configuration, monitoring, and diagnosis of a networking product.

Similarly, database products are extremely complex and perform much of their own internal management. The settings of parameters external to OpenVMS may have a profound effect upon how efficiently OpenVMS is used. Thus, reviewing server application specific-material is a must if you are to efficiently understand and resolve a related performance issue.

1.1 System Performance Management Guidelines

Even if you are familiar with basic concepts discussed in this section, there are some details discussed that are specific to this process, so please read the entire section.

1.1.1 The Performance Management Process

Long term measurement and observation of your system is key to understanding how well it is working and is invaluable in identifying potential performance problems before they become so serious that the system grinds to a halt and it negatively affects your business. Thus, performance management should be a routine process of monitoring and measuring your systems to assure good operation through deliberate planning and resource management.

Waiting until a problem cripples a system before addressing system performance is not performance management, rather it is crisis management. Performance management involves:

Systematically measuring the system
Gathering and analyzing the data
Evaluating trends
Archiving data to maintain a performance history

You will often observe trends and thus be able to address performance issues before they become serious and adversely affect your business operations. Should an unforeseen problem occur, your historical data will likely prove invaluable for pinpointing the cause and rapidly and efficiently resolving the problem. Without past data from your formerly well-running system, you may have no basis upon which to judge the value of the metrics you can collect on your currently poorly running system. Without historical data you are guessing; resolution will take much longer and cost far more.

Upgrades and Reconfigurations

Some systems are so heavily loaded that the cost of additional functionality of new software can push the system beyond the maximum load that the system was intended to handle and thus deliver unacceptable response times and throughput. If your system is running near its limit now during peak workload periods, you want to ensure that you take the steps necessary to avoid pushing your system beyond its limits when you cannot afford it.

If your system is not a finely tuned, well-running machine, you are advised to use caution when considering changes to anything. Your system is already being pushed to, or beyond, its original designed capacity if you have observed users complaining about:

Slow response times
Erratic system behavior
Unexplained system pauses, hangs, or crashes

If this is the case, you need a performance audit to determine your current workload and the resources necessary to adequately support your current and possibly future workloads. Implementing changes not specifically designed to increase such a system's capacity or reduce its workload can degrade performance further. Thus, investing in a performance audit will pay off by delivering you a more reliable, productive, available, and lower maintenance system.

Many factors involved in upgrades and reconfigurations contribute to increased resource consumption. Future workloads your system will be asked to support may be unforseeable due to changes in the system, workload, and business.

Blind reconfiguration without measurement, analysis, modification, and contingency plans can result in serious problems. Significant increases in CPU, disk, memory, and LAN utilization demand serious consideration, measurement, and planning for additional workload and upgrades.

1.1.2 Conducting a Performance Audit

The goals of a performance audit are to:

Evaluate whether your systems are viable candidates for proposed changes.
Identify modifications that must be made.
Insure that planned and implemented changes deliver expected results.

A proper performance audit will:

Characterize CPU, disk, memory, and LAN utilization on the systems under consideration before reconfiguration.
Measure system activity after an installation.

Without scientific measurement before installation and modification, as well as after, you will not acquire the data necessary to understand, plan for, and resolve potential problems in the immediate as well as distant future. Keep the following in mind:

Measure, plan, understand, test, and confirm. To understand how system workloads vary, you should perform measurements for one week, if not longer, before installing your network
Take into account that workloads follow business cycles which vary predictably throughout the day, the week, the month, and the year. These variations may be affected by financial and legal deadlines as well as seasonal factors such as holidays and other cyclic activity.
Seek to identify periods of peak heavy loads (relatively long periods of heavy load lasting approximately five or more minutes). Understanding their frequency and the factors affecting them is key to successful system planning and management.

Peak Workloads and the Cyclic Nature of Workloads

You must first identify periods of activity during which you cannot afford to have system performance degrade and then measure overall system activity during these periods.

These periods will vary from system to system minute to minute, hour to hour, day to day, week to week, and month to month. Holidays and other such periods are often significant factors and should be considered. These periods depend upon the business cycles that the system is supporting.

If the periods you have identified as critical cannot be measured at this time, then measurements taken in the immediate future will have to be used as the basis for estimates of the activity during those periods. In such cases you will have to take measurements in the near term and make estimates based on the data you collect in combination with other data such as order rates from the previous calendar month or year, as well as projections or forecasts. But factors other than the CPU may become a bottleneck and slow down the system. For example, a fixed number of assistants can only process so many orders per hour, regardless of the number of callers waiting on the phone for service.

1.2 Strategies and Procedures

This manual describes several strategies and procedures for evaluating performance, evaluating system resources, and diagnosing resource limitations as shown in the following list:

Develop workload strategy (Chapter 1)
- Managing the work load
- Distributing the work load
- Sharing application code
Develop tuning strategy (Chapter 2)
- Automatic Working Set Adjustment (AWSA)
- AUTOGEN
- Active memory management
Perform general system resource evaluation (Chapter 4)
- CPU resource
- Memory resource
- Disk I/O resource
Conduct a preliminary investigation of specific resource limitations (Chapter 5)
- Isolating memory resource limitations
- Isolating disk I/O resource limitations
- Isolating CPU resource limitations
Review techniques for improving system resource responsiveness (Chapter 6)
- Providing equitable sharing of resources
- Reducing resource consumption
- Ensuring load balancing
- Initiating offloading
Apply specific remedy to compensate for resource limitations (Chapter 10)
- Compensating for memory-limited behavior
- Compensating for I/O-limited behavior
- Compensating for CPU-limited behavior

1.3 System Manager's Role

As a system manager, you must be able to do the following:

Assume the responsibility for understanding the system's work load sufficiently to be able to recognize normal and abnormal behavior.
Predict the effects of changes in applications, operations, or usage.
Recognize typical throughput rates.
Evaluate system performance.
Perform tuning as needed.

1.3.1 Prerequisites

Before you adjust any system parameters, you should:

Be familiar with system tools and utilities.
Know your work load.
Develop a strategy for evaluating performance.

1.3.2 System Utilities and Tools

You can observe system operation using the following tools:

Accounting utility (ACCOUNTING)
Audit Analysis utility (ANALYZE/AUDIT)
Authorize utility (AUTHORIZE)
AUTOGEN command procedure
DCL SHOW commands
DECamds (Compaq Availability Manager)
DECevent utility (Alpha only)
Error Log utility (ANALYZE/ERROR_LOG)
Monitor utility (MONITOR)

On Alpha platforms, Compaq recommends using the DECevent utility instead of the Error Log utility, ANALYZE/ERROR_LOG. (You invoke the DECevent utility with the DCL command, DIAGNOSE.) You can use ANALYZE/ERROR_LOG on Alpha systems, but the DECevent utility provides more comprehensive reports.