HP OpenVMS Systems Documentation

Content starts here

Compaq Availability Manager User's Guide

Order Number: AA-RNSJB-TE


June 2002

This guide explains how to use Compaq Availability Manager software to detect and correct system availability problems.

Revision/Update Information: This guide supersedes the Availability Manager User's Guide, Version 1.4 of the printed manual and Version 2.1 of the HTML manual.

Operating System: Data Analyzer: Windows 2000 SP 2 or higher; Windows XP;
OpenVMS Alpha Version 7.1 or later
Data Collector: OpenVMS Alpha and
VAX Version 6.2 or later

Software Version: Compaq Availability Manager Version 2.2

Compaq Computer Corporation
Houston, Texas


© 2002 Compaq Information Technologies Group, L.P.

Compaq, the Compaq logo, Alpha, OpenVMS, Tru64, VAX, VMS, and the DIGITAL logo are trademarks of Compaq Information Technologies Group, L.P. in the U.S. and/or other countries.

Microsoft, MS-DOS, Visual C++, Windows, and Windows NT are trademarks of Microsoft Corporation in the U.S. and/or other countries.

Intel, Intel Inside, and Pentium are trademarks of Intel Corporation in the U.S. and/or other countries.

Motif, OSF/1, and UNIX are trademarks of The Open Group in the U.S. and/or other countries.

Java and all Java-based marks are trademarks or registered trademarks of Sun Microsystems, Inc., in the U.S. and other countries.

All other product names mentioned herein may be trademarks of their respective companies.

Confidential computer software. Valid license from Compaq required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license.

Compaq shall not be liable for technical or editorial errors or omissions contained herein. The information in this document is provided "as is" without warranty of any kind and is subject to change without notice. The warranties for Compaq products are set forth in the express limited warranty statements accompanying such products. Nothing herein should be construed as constituting an additional warranty.

ZK6552

The Compaq OpenVMS documentation set is available on CD-ROM.

Contents Index


Preface

Intended Audience

This guide is intended for system managers who install and use Compaq Availability Manager software. It is assumed that the system managers who use this product are familiar with Windows terms and functions.

Note

The term Windows as it is used in this manual refers to either Windows 2000 or Windows XP but not to any other Windows product.

Document Structure

This guide contains the following chapters and appendixes:

  • Chapter 1 provides an overview of Availability Manager software, including security features.
  • Chapter 2 tells how to start the Availability Manager, use the main Application window, select a group of nodes and individual nodes, and use online help.
  • Chapter 3 tells how to select nodes and display node data; it also explains what that data is.
  • Chapter 4 tells how to display OpenVMS Cluster summary and detailed data; it also explains what that data is.
  • Chapter 5 tells how to display and interpret events.
  • Chapter 6 tells how to take a variety of corrective called fixes, to improve system availability.
  • Chapter 7 describes the tasks you can perform to filter, select, and customize the display of data and events.
  • Appendix A contains a table of CPU process states, which are referred to in Section 3.2.2.4 and in Section 3.3.1.
  • Appendix B contains a table of OpenVMS and Windows events that can be displayed in the Events pane discussed in Chapter 5.
  • Appendix C describes the events that can be signaled for each type of OpenVMS data that is collected.

Related Documents

The following manuals provide additional information:

  • OpenVMS System Manager's Manual describes tasks for managing an OpenVMS system. It also describes installing a product with the POLYCENTER Software Installation utility.
  • OpenVMS System Management Utilities Reference Manual describes utilities you can use to manage an OpenVMS system.
  • OpenVMS Programming Concepts Manual explains OpenVMS lock management concepts.

For additional information about Compaq OpenVMS products and services, access the Compaq website at the following location:


http://www.openvms.compaq.com/

Reader's Comments

Compaq welcomes your comments on this manual. Please send comments to either of the following addresses:

Internet openvmsdoc@compaq.com
Mail Compaq Computer Corporation
OSSG Documentation Group, ZKO3-4/U08
110 Spit Brook Rd.
Nashua, NH 03062-2698

How to Order Additional Documentation

Visit the following World Wide Web address for information about how to order additional documentation:


http://www.openvms.compaq.com/

Conventions

The following conventions are used in this guide:

Ctrl/ x A sequence such as Ctrl/ x indicates that you must hold down the key labeled Ctrl while you press another key or a pointing device button.
PF1 x A sequence such as PF1 x indicates that you must first press and release the key labeled PF1 and then press and release another key or a pointing device button.
[Return] In examples, a key name enclosed in a box indicates that you press a key on the keyboard. (In text, a key name is not enclosed in a box.)

In the HTML version of this document, this convention appears as brackets, rather than a box.

... A horizontal ellipsis in examples indicates one of the following possibilities:
  • Additional optional arguments in a statement have been omitted.
  • The preceding item or items can be repeated one or more times.
  • Additional parameters, values, or other information can be entered.
.
.
.
A vertical ellipsis indicates the omission of items from a code example or command format; the items are omitted because they are not important to the topic being discussed.
( ) In command format descriptions, parentheses indicate that you must enclose choices in parentheses if you specify more than one.
[ ] In command format descriptions, brackets indicate optional choices. You can choose one or more items or no items. Do not type the brackets on the command line. However, you must include the brackets in the syntax for OpenVMS directory specifications and for a substring specification in an assignment statement.
| In command format descriptions, vertical bars separate choices within brackets or braces. Within brackets, the choices are optional; within braces, at least one choice is required. Do not type the vertical bars on the command line.
{ } In command format descriptions, braces indicate required choices; you must choose at least one of the items listed. Do not type the braces on the command line.
bold text This typeface represents the introduction of a new term. It also represents the name of an argument, an attribute, or a reason.
italic text Italic text indicates important information, complete titles of manuals, or variables. Variables include information that varies in system output (Internal error number), in command lines (/PRODUCER= name), and in command parameters in text (where dd represents the predefined code for the device type).
UPPERCASE TEXT Uppercase text indicates a command, the name of a routine, the name of a file, or the abbreviation for a system privilege.
Monospace text Monospace type indicates code examples and interactive screen displays.

In the C programming language, monospace type in text identifies the following elements: keywords, the names of independently compiled external functions and files, syntax summaries, and references to variables or identifiers introduced in an example.

- A hyphen at the end of a command format description, command line, or code line indicates that the command or statement continues on the following line.
numbers All numbers in text are assumed to be decimal unless otherwise noted. Nondecimal radixes---binary, octal, or hexadecimal---are explicitly indicated.


Chapter 1
Overview

This chapter answers the following questions:

  • What is the Availability Manager?
  • How does the Availability Manager work?
  • How does the Availability Manager identify possible performance problems?
  • How does the Availability Manager maintain security?

1.1 What Is the Availability Manager?

The Availability Manager is a system management tool that allows you to monitor, from an OpenVMS or Windows node, one or more OpenVMS nodes on an extended local area network (LAN).

The Availability Manager helps system managers and analysts target a specific node or process for detailed analysis. This tool collects system and process data from multiple OpenVMS nodes simultaneously, analyzes the data, and displays the output using a graphical user interface (GUI).

Features and Benefits

The Availability Manager offers many features that can help system managers improve the availability, accessibility, and performance of OpenVMS nodes and clusters.

Feature Description
Immediate notification of problems Based on its analysis of data, the Availability Manager notifies you immediately if any node you are monitoring is experiencing a performance problem, especially one that affects the node's accessibility to users. At a glance, you can see whether a problem is a persistent one that warrants further investigation and correction.
Centralized management Provides centralized management of remote nodes within an extended local area network (LAN).
Intuitive interface Provides an easy-to-learn and easy-to-use graphical user interface (GUI). An earlier version of the tool, DECamds, uses a Motif GUI to display information about OpenVMS nodes. The Availability Manager uses a Java GUI to display information about OpenVMS nodes on an OpenVMS or a Windows node.
Correction capability Allows real-time intervention, including adjustment of node and process parameters, even when remote nodes are hung.
Uses its own protocol An important advantage of the Availability Manager is that it uses its own network protocol. Unlike most performance monitors, the Availability Manager does not rely on TCP/IP or any other standard protocol. Therefore, even if a standard protocol is unavailable, the Availability Manager can continue to operate.
Customization Using a wide range of customization options, you can customize the Availability Manager to meet the requirements of your particular site. For example, you can change the severity levels of the events that are displayed and escalate their importance.
Scalability Makes it easier to monitor multiple OpenVMS nodes.

Figure 1-1 is an example of the initial Application window of the Availability Manager.

Figure 1-1 Application Window


The Application window is divided into the following sections:

  • In the upper left section of the window is a list of user-defined groups of nodes. You can click either the name of a group or the icon in front of it to select a group.
  • In the upper right section is a list of the nodes in the group you selected. Double-click a node name or the icon in front of it to display more detailed data for that node. You can also double-click data items in each row to display more detailed data about a specific item.
  • In the lower section events are posted, alerting you to possible problems on your system.

1.2 How Does the Availability Manager Work?

The Availability Manager uses two types of nodes to monitor systems:

  • One or more OpenVMS Data Collector nodes, which contain the software that collects data.
  • An OpenVMS or a Windows Data Analyzer node, which contains the software that analyzes the collected data.

The Data Analyzer and Data Collector nodes communicate over an extended LAN using an IEEE 802.3 Extended Packet format protocol. Once a connection is established, the Data Analyzer instructs the Data Collector to gather specific system and process data.

Although you can run the Data Analyzer as a member of a monitored cluster, it is typically run on a system that is not a member of a monitored cluster. In this way, the Data Analyzer will not hang if the cluster hangs.

Only one Data Analyzer at a time should be running on each node; however, more than one can be running in the LAN at any given time.

Figure 1-2 shows a possible configuration of Data Analyzer and Data Collector nodes.

Figure 1-2 Availability Manager Node Configuration


In Figure 1-2, the Data Analyzer can monitor nodes A, B, and C across the network. The password on node D does not match the password of the Data Analyzer; therefore, the Data Analyzer cannot monitor node D.

For information about password security, see Section 1.4.

Requesting and Receiving Information

After installing the Availability Manager software, you can begin to request information from one or more Data Collector nodes.

Requesting and receiving information requires the Availability Manager to perform a number of steps, which are shown in Figure 1-3 and explained after the figure.

Figure 1-3 Requesting and Receiving Information


The following steps correspond to the numbers in Figure 1-3.

  1. The GUI communicates users' requests for data to the driver on the Data Analyzer node.
  2. The Data Analyzer driver sends users' requests across the network to a driver on a Data Collector node.
  3. The Data Collector driver transmits the requested information over the network to the driver on the Data Analyzer node.
  4. The Data Analyzer driver passes the requested information to the GUI, which displays the data.

In step 4, the Availability Manager also checks the data for any events that should be posted. The following section explains in more detail how data analysis and event detection work.

1.3 How Does the Availability Manager Identify Performance Problems?

When the Availability Manager detects problems on your system, it uses a combination of methods to bring these problems to the attention of the system manager. If no data display is open for a particular node, the Availability Manager reduces the data collection interval so that data can be analyzed more closely. Performance events are also posted in the Event pane, which is in the lower portion of the Application window (Figure 1-1).

The following topics are related to detecting problems and posting events:

  • Collecting and analyzing data
  • Posting events

1.3.1 Collecting and Analyzing Data

This section explains how the Availability Manager collects and analyzes data. It also defines terms related to data collection and analysis.

1.3.1.1 Types of Data Collection

You can use the Availability Manager to collect data either as a background activity or as a foreground activity.

  • Background data collection
    When you enable background collection of a specific type of data on a specific node, the Availability Manager collects that data whether or not any windows are currently displaying data for that node.
    To enable background data collection, select the check box for a specific type of data on the Data Collection Customization page (Figure 1-4). Note that if the Customize window applies to all OpenVMS nodes, the data collection properties that you set are for all nodes. If the window applies to a specific node, the properties you set apply only to that node.
    Chapter 7 contains instructions for customizing data collection properties.

    Figure 1-4 Data Collection Customization Page



  • Foreground data collection
    Foreground data collection occurs automatically when you open any data page for a specific node. To open a node data page, double-click a node name in the Node pane of the Application window (Figure 1-1). The Node Summary page is the first page displayed (by default); Figure 1-5 is an example. At the top of the page are tabs that you can select to display other data pages for that node.

    Figure 1-5 Sample Node Summary Page



    Foreground data collection for all data types begins automatically when any node data page is displayed. Data collection ends when all node data pages have been closed.
    Chapter 3 contains instructions for selecting nodes and displaying node data.

1.3.1.2 Events and Data Collection

An event is a problem or potential problem associated with resource availability. Users can customize criteria for events. Events are associated with types of data collected. For example, collection of CPU data is associated with the PRCCUR, PRCMWT, and PRCPWT events. (Appendix B describes events, and Appendix C describes the events that each type of data can signal.)

When the GUI requests one type of data from the Data Collector (for example, CPU data for all the processes on the system), a snapshot is taken of that type of data. This snapshot is considered one data collection.

1.3.1.3 Data Collection Intervals

Data collection intervals, which are displayed on the Data Collection customization page (Figure 1-4), specify the frequency of data collection. Table 1-1 describes these intervals.

Table 1-1 Data Collection Intervals
Interval (in seconds) Type of Data Collection Description
NoEvent Background How often data is collected if no events have been posted for that type of data.

The Availability Manager starts background data collection at the NoEvent interval (for example, every 75 seconds). If no events have been posted for that type of data, the Availability Manager starts a new collection cycle every 75 seconds.

Event Background How often data is collected if any events have been posted for that type of data.

The Availability Manager continues background data collection at the Event interval until all events for that type of data have been removed from the Event pane. Data collection then resumes at the NoEvent interval.

Display Foreground How often data is collected when the page for a specific node is open.

The Availability Manager starts foreground data collection at the Display interval and continues this rate of collection until the display is closed. Data collection then resumes as a background activity.


Next Contents Index