Intelligent Peripheral Fault Manager For Digital UNIX Installation and User's Guide Order Number: AA-QN0FA-TE March 1996 This guide provides installation information and operator instructions for the Intelligent Peripheral Fault Manager software. Use this guide in conjunction with the AlphaServer Intelligent Peripheral Platform Owner's Guide, the AlphaServer Intelligent Peripheral Platform System Manager's Guide, and Digital UNIX developer's documentation. Revision/Update Information: This is a new manual. Operating System: Digital UNIX, Version 3.2C Software Version: IP Fault Manager for Digital UNIX, Version 1.0 Digital Equipment Corporation Maynard, Massachusetts ________________________________________________________________ March 1996 Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description. Possession, use, or copying of the software described in this publication is authorized only pursuant to a valid written license from Digital or an authorized sublicensor. No responsibility is assumed for the use or reliability of software on equipment that is not supplied by Digital Equipment Corporation or its affiliated companies. Restricted Rights: Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in subparagraph (c) (1) (ii) of DFARS 252.227-7013, or in FAR 52.227-19, or in FAR 52.227-14 Alt. III, as applicable. © Digital Equipment Corporation 1996. All Rights Reserved. The following are trademarks of Digital Equipment Corporation: AlphaServer, AlphaStation, DEC, Digital, Digital UNIX, DECbridge, DEChub, DECpacketprobe, , DECsafe, DECserver, HUBwatch, LAT, LA75 Plus Companion, POLYCENTER, PROBEwatch, RX, RZ, StorageWorks, ThinWire, VAX DOCUMENT, and the DIGITAL logo. Dialogic is a registered trademark of Dialogic Corporation. Novell is a registered trademark of Novell, Inc. Windows NT is a trademark of Microsoft Corporation. Xerox is a registered trademark of Xerox Corporation. All other trademarks and registered trademarks are the property of their respective owners. S3081 This document was prepared using VAX DOCUMENT Version 2.1. _________________________________________________________________ Contents Preface................................................... vii 1 Preparing for Software Installation 1.1 IP Fault Manager for Digital UNIX Overview.... 1-1 1.1.1 Operating Description..................... 1-1 1.1.2 Application Components.................... 1-3 1.2 AlphaServer Intelligent Peripheral Platform... 1-4 1.2.1 Product Overview.......................... 1-4 1.2.2 IP Fault Manager on AlphaServer Intelligent Peripheral Platform........... 1-4 1.3 POLYCENTER System Watchdog.................... 1-5 1.3.1 Product Overview.......................... 1-5 1.3.2 External Event File....................... 1-5 1.3.3 Simplex/Duplex Consolidator's Configuration Files....................... 1-6 1.3.4 POLYCENTER System Watchdog Reporting Screen.................................... 1-6 1.3.5 Error Events Reported by POLYCENTER System Watchdog.................................. 1-8 2 Installing the IP Fault Manager 2.1 Software Installation Guidelines.............. 2-2 2.2 Reading Release Notes......................... 2-3 2.3 Installation Requirements..................... 2-3 2.3.1 Software and Firmware Prerequisites....... 2-3 2.3.2 Checking the Software Distribution Kit.... 2-4 2.4 Pre-Installation Information.................. 2-5 2.4.1 Installation Time......................... 2-5 2.4.2 Privileges Required for Installation...... 2-5 2.4.3 Disk Space Required....................... 2-5 2.4.4 Backing Up Your System Disk............... 2-5 iii 2.5 Installing the Product Authorization Keys (PAK)......................................... 2-6 2.6 Installing the IP Fault Manager Software...... 2-6 2.6.1 Starting the Installation ................ 2-6 2.6.2 Did You Receive Mount Messages?........... 2-8 2.6.3 Ending the Installation................... 2-8 2.6.4 POLYCENTER System Watchdog Setup.......... 2-9 2.6.5 POLYCENTER System Watchdog Configuration............................. 2-9 2.6.6 Running the Installation Verification Procedure (IVP)........................... 2-12 2.6.7 Installation Hints........................ 2-13 2.6.8 Deinstalling Savesets..................... 2-13 3 Understanding IP Fault Manager 3.1 IP Fault Manager Alarm Event.................. 3-2 3.2 IP Fault Manager Programming Interface........ 3-3 3.3 IP Fault Manager Alarm Panel.................. 3-3 3.3.1 Alarm Panel Description................... 3-3 3.3.2 Alarm Notification........................ 3-5 3.3.3 Activating Alarm Cut-Off.................. 3-7 3.3.4 Activating Alarm Reset.................... 3-7 3.3.5 Alarm Board Acknowledgments............... 3-8 3.4 IP Fault Manager Maintenance Center Monitor... 3-11 3.5 IP Fault Manager Timer Reset Process.......... 3-11 3.6 AlphaServer 1000 Server Management Registers..................................... 3-12 3.7 IP Fault Manager Event Log.................... 3-13 3.8 PSW External Event Log........................ 3-14 3.9 IP Fault Manager Disk Monitoring.............. 3-15 4 IP Fault Manager Alarm Utility Operator Interface 4.1 Overview...................................... 4-1 4.1.1 Accessing the Menu Interface.............. 4-2 4.1.2 Using the Main Menu....................... 4-2 4.2 Set or Clear Alarm............................ 4-3 4.3 Set or Clear System Status.................... 4-6 4.4 Remote Access to Real Time Information........ 4-7 4.5 Clear a System Watchdog Event................. 4-8 4.6 Add or Remove Disk Monitoring................. 4-9 iv 4.7 Perform DTP Cut-Off........................... 4-13 4.8 Exit Alarm Manager ........................... 4-13 4.9 Dialogic Telco Platform Troubleshooting....... 4-13 5 IP Fault Manager Programming Interface 5.1 Alarm Event................................... 5-1 5.2 User API...................................... 5-1 5.2.1 Request Commands.......................... 5-2 5.2.2 User Input Commands....................... 5-4 5.2.2.1 SET_CRITICAL_ALARM Command Example...... 5-6 5.2.3 Command Output............................ 5-10 A IP Fault Manager Sample Installation Script B IP Fault Manager Files Installed on Your System Glossary Index Tables 1-1 Reported Error Events..................... 1-8 3-1 Alarm Board Acknowledgments............... 3-9 3-2 Alarm Board Detected Alarm Events......... 3-10 5-1 User API Commands......................... 5-5 5-2 Messages Returned by the ipfm_alarm.h Function ................................. 5-9 B-1 List of Files After Installation.......... B-1 v _________________________________________________________________ Preface About This Guide This guide provides installation information and operator instructions for the Intelligent Peripheral Fault Manager software. ________________________ Note ________________________ Your AlphaServer Intelligent Peripheral Platform system is shipped to your site with the operating system and IP Fault Manager software installed. Keep this guide with your distribution kit. You will need it to install maintenance updates or to reinstall the product for any other reason. ______________________________________________________ About The Product The AlphaServer Intelligent Peripheral Platform is a collection of hardware and software products, including the Intelligent Peripheral Fault Manager for Digital UNIX software: o AlphaServer 1000 system o AlphaStation system for use as the IP console workstation o IP Platform ISA bus expansion chassis with IP alarm panel (also known as the Dialogic Telco Platform (DTP)) o DEChub 90 communication subsystem o StorageWorks BA35x-Sx mass storage disk drive subsystem o Digital UNIX operating system vii o HUBwatch for Digital UNIX software o POLYCENTER System Watchdog Consolidator/Agent products o POLYCENTER Advanced File System Utilities software o Dialogic Drivers for Digital UNIX software o Digital UNIX Logical Storage Manager software o DECsafe Available Server for Digital UNIX software (duplex systems only) o Intelligent Peripheral Fault Manager for Digital UNIX software o DIGI Acceleport UNIX Driver Intended Audience This document is intended for system managers and programmers trained in software installation, system management and programming who will be using the IP Fault Manager software. Structure of This Document This guide is organized in the following manner: o Chapter 1 - Prepares you for software installation. o Chapter 2 - Describes the IP Fault Manager software installation. o Chapter 3 - Describes the IP Fault Manager software components. o Chapter 4 - Describes the IP Fault Manager operator interface. o Chapter 5 - Describes the IP Fault Manager programming interface. o Appendix A - Provides a sample installation script. o Appendix B - Lists IP Fault Manager files. viii Reader's Comments Digital welcomes your comments on this or any other manual. You can send your comments to Digital in the following ways: o Internet electronic mail: readers_comment@zk3.dec.com o Mail: Digital Equipment Corporation Shared Engineering Services PKO3-2/A9 129 Parker Street Maynard, MA 01754-2199 For additional information, call 1-800-DIGITAL. ix Related Documentation For additional information on the AlphaServer IP Platform subsystem components and related software, refer to the documentation in the following tables. Order numbers may change as documents are revised or updated. Check with your Digital sales representative for additional information. ___________________________________________________________ Order AlphaServer IP Peripheral Platform Number __________________________________________________________ AlphaServer Intelligent Peripheral System AA-QU0JA-TE Manager's Guide AlphaServer Intelligent Peripheral Platform EK-ASIPP-OG Owner's Guide ___________________________________________________________ Order AlphaServer 1000 Processor Number ___________________________________________________________ AlphaServer 1000 Rackmount Owner's Guide EK-RMALP-OG Digital UNIX Installation Guide AA-PS2DE-TE StorageWorks KZPSA PCI-to-SCSI Storage EK-KZPSA-UG Adapter User's Guide ___________________________________________________________ Order BA35x-Sx Modular Storage Shelf Number ___________________________________________________________ StorageWorks Family User's Guide EK-BA350-UG BA350 Modular Storage Shelf Subsystem EK-BA350-CG Configuration Guide SCSI Signal Converter Service Manual/DWZZA-VA EK-DWZAA-SV RZ Series Disk Drive Installation Guide - EK-DRZ01-IG Models RZ35, RZ26, RZ27, RZ28 x ___________________________________________________________ Order DEChub 90 Communication Subsystem Number ___________________________________________________________ DEChub 90 Ethernet Backplane Owner's Manual EK-DEHUB-OM DECbridge 90 Owner's Manual EK-DEWGB-OM DECrepeater 90C Owner's Manual EK-DECMR-OM DECagent 90 User's Information EK-DENMA-UI DECserver 90M Owner's Manual EK-DSRVH-OM DECbrouter 90 Installation and Operating EK-DECBR-OM Information ___________________________________________________________ Order IP ISA Bus Expansion Chassis Number ___________________________________________________________ AlphaServer Voice Platform Hardware EK-VOICE-IN Installation Guide Dialogic Voice Hardware Reference BX-QLTUA-TE Dialogic Network Hardware Reference BX-QLTTA-TE Dialogic VR/160 Hardware Reference BX-QLTVA-TE Dialogic FAX/120 Hardware Reference BX-QLTSA-TE ___________________________________________________________ Order IP Platform Console Workstation Number ___________________________________________________________ Digital AlphaStation 200 Series User EK-PCDTA-UI Information HUBwatch Use AA-PW4BE-TE HUBwatch Installation and Configuration AA-Q358C-TE Manual POLYCENTER System Watchdog Agent and QA-3A5AA-GZ Consolidator For Digital UNIX xi Conventions The following conventions are used in this guide: ___________________________________________________________ Conventions Description ----------------------------------------------------------- IP Describes an industry-standard acronym for Intelligent Peripheral. # A pound sign (#) is the default superuser prompt. % A percent sign (%) is the default user prompt. A boxed symbol indicates that you must press the named key on the keyboard. Ctrl/C This symbol indicates that you must press the Ctrl key while you simultaneously press another key (in this case, C). % cat In interactive examples, typed user input appears in a bold typeface. monospaced In text, this typeface indicates the exact name of a command, routine, partition, pathname, directory, or file. This typeface is also used in interactive examples and other screen displays. UPPERCASE The Digital UNIX operating system lowercase differentiates between lowercase and uppercase characters. Literal strings that appear in text, examples, syntax descriptions, and function definitions must be typed exactly as shown. xii ___________________________________________________________ Conventions Description ----------------------------------------------------------- setld(8) Cross-references to online reference pages include the appropriate section number in parentheses. For example, setld(8) indicates that you can find the material on the setld command in Section 8 of the reference pages. italic text Italic text emphasizes important information and indicates complete titles of manuals. [y] In a prompt, square brackets indicate that the enclosed item is the default response. For example, [y] means the default response is Yes. Unless otherwise noted, press the Return key after entering commands or responses to command prompts. xiii 1 _________________________________________________________________ Preparing for Software Installation This chapter discusses the following topics: o IP Fault Manager for Digital UNIX o AlphaServer Intelligent Peripheral Platform o POLYCENTER System Watchdog Configuration File 1.1 IP Fault Manager for Digital UNIX Overview The IP Fault Manager for Digital UNIX is a layered software product that provides fault management services to AlphaServer Intelligent Peripheral Platform systems. 1.1.1 Operating Description The IP Fault Manager software runs on an AlphaServer 1000 system running the Digital UNIX operating system, and performs the following functions: o Monitors the AlphaServer 1000 system temperature by means of the AlphaServer 1000 system management registers (does not cause an alarm event at this time.) o Monitors the IP Fault Manager alarm panel (DTP chassis) power supply input and output voltages, battery charge, and the alarm board circuitry. ________________________ Note ________________________ The Dialogic Telco Platform (DTP) alarm board is referred to as the IP Fault Manager alarm panel throughout this document. ______________________________________________________ o Monitors the SCSI disks in the AlphaServer 1000 system and in the StorageWorks storage shelf. 1-1 o Sends messages to the POLYCENTER System Watchdog to: - Indicate what alarms should be set or cleared. - Display events monitored by POLYCENTER System Watchdog on the IP Fault Manager alarm panel. o Indicates the name of the events that trigger the alarm action. 1-2 o Monitors the fault status of the ISA bus expansion chassis. 1.1.2 Application Components The following components work together to provide fault management for the AlphaServer Intelligent Peripheral Platform: o IP Fault Manager alarm panel (Dialogic Telco Platform (DTP) alarm board) o IP Fault Manager maintenance center monitor o IP Fault Manager timer reset process o IP Fault Manager alarm event table lookup o IP Fault Manager Set/Clear/ Request alarm information utility o Digital UNIX driver to read the server management register on the AlphaServer 1000 system CPU o Digital UNIX driver to read the PCI register on the AlphaServer 1000 system CPU o POLYCENTER System Watchdog for Digital UNIX operating system o POLYCENTER System Watchdog Consolidator configuration files o POLYCENTER System Watchdog Actor Manager configuration files o POLYCENTER System Watchdog Action routine o IP alarm event programming interface o Disk monitor for monitoring disk events 1-3 1.2 AlphaServer Intelligent Peripheral Platform The AlphaServer Intelligent Peripheral Platform is a rackmount hardware product that provides voice processing, FAX, voice recognition and voice messaging services for telecommunications service providers. 1.2.1 Product Overview The AlphaServer Intelligent Peripheral Platform is available in two basic variations, a simplex system and a duplex system. The simplex system provides one each of the following components, whereas the duplex system provides two each of the following components: o AlphaServer 1000 system with EISA/ISA bus o ISA bus expansion chassis (Dialogic Telco Platform with alarm panel) o StorageWorks storage shelf with SCSI disks o DEChub 90 communications hub 1.2.2 IP Fault Manager on AlphaServer Intelligent Peripheral Platform Each AlphaServer Intelligent Peripheral Platform simplex and duplex configuration requires a console workstation running the Digital UNIX operating system, the POLYCENTER System Watchdog and the IP Fault Manager software. The console is the management operations center for the AlphaServer systems as well as the AlphaServer Intelligent Peripheral Platform. Each AlphaServer Intelligent Peripheral Platform configuration can be managed locally or remotely through the console workstation. The console displays (in a terminal window) the IP Fault Manager-detected fault and system status of the AlphaServer Intelligent Peripheral Platform. For additional information about the AlphaServer Intelligent Peripheral Platform, refer to the AlphaServer Intelligent Peripheral Platform Owner's Guide. 1-4 1.3 POLYCENTER System Watchdog 1.3.1 Product Overview POLYCENTER System Watchdog is a software application that enables the automatic monitoring of specified problems on selected nodes from a single terminal. When POLYCENTER System Watchdog detects a problem, either on a monitored node or with the network, it deals with the problem automatically by carrying out predefined actions before the problem becomes apparent to other users on the network. POLYCENTER System Watchdog software focuses on problem and fault management; it recognizes a problem or fault, analyzes the cause or problem, determines an appropriate response, implements that response, and tracks the problem or fault to its resolution. POLYCENTER System Watchdog consists of multiple software modules. Each module has an interface to receive and send data; modules normally run only when they receive data. Control modules that provide different interfaces to manage POLYCENTER System Watchdog can be inserted into the network by users. These control modules allow the use of a preferred style of interface to manage POLYCENTER System Watchdog. POLYCENTER System Watchdog consists of two functional parts: o POLYCENTER System Watchdog Agents to detect and report events as event messages. o POLYCENTER System Watchdog System Watchdog Consolidators to request and handle event messages. 1.3.2 External Event File Each problem detected by POLYCENTER System Watchdog is called an event. Each event has an associated event code. When POLYCENTER System Watchdog detects an event at a node, it reports it as an event message, using the event code to identify the problem. Event messages are also marked as either NEW, UPD (updated), or REM (removed), according to whether the event has just been discovered, continues to be found, or has disappeared. When POLYCENTER 1-5 System Watchdog receives event messages that notify it of events occurring on the monitored systems, it responds automatically according to what it has been programmed to do. It may notify the appropriate personnel, or start more action routines, or it may do both. 1.3.3 Simplex/Duplex Consolidator's Configuration Files Software shell scripts set up the POLYCENTER System Watchdog Consolidator configuration files on the IP Platform (for both the simplex and duplex system). The only difference between a simplex and duplex configuration file is that the duplex system user is prompted to enter the name of the other node, and the simplex system user is not prompted for the information. The Consolidator configuration file is read by the Consolidator software each time the Consolidator is either started or reconfigured. The Consolidator configuration file specifies the following: o Polling interval (the amount of time between each poll of the Agent by the Consolidator) o The events to be checked at specific nodes o The frequency of event checking o Data for any data-specific events Once the Consolidator knows what is needed, it notifies the Agents, and then requests checks of the node for events and the delivery of their event messages. 1.3.4 POLYCENTER System Watchdog Reporting Screen POLYCENTER System Watchdog (PSW) is the definitive indicator of an outstanding alarm. Visual and audible alarms can be cleared from the alarm panel using the CUT-OFF function. However, events are not cleared from the POLYCENTER System Watchdog screen except by means of a specific management command; or, events are cleared if the PSW Consolidator originally detected the event and determined that the condition no longer exists. 1-6 Figure 1-2 is an example of the PSW reporting screen. 1-7 1.3.5 Error Events Reported by POLYCENTER System Watchdog POLYCENTER System Watchdog contains an alarm event table lookup process that searches for a match of an error code detected by one of the POLYCENTER System Watchdog agents. Table 1-1 lists error events that can be reported by POLYCENTER System Watchdog. Table 1-1 Reported Error Events ___________________________________________________________ Alarm Event Code Status Description ___________________________________________________________ Internal PSW Minor Error originated error from shell script psw_sensor_fly_custom. Data VAL Minor Consolidator configuration validation file data is validated error on Agent side (whenever applicable) before being used. Process PRO Minor IP Fault Manager process is missing. AlphaServer EXT Major Disk monitoring event, and 1000 system power supply failure (duplex systems only). CPU error CPU Major Event message shows the number of existing CPU errors on the node. Disk error DSK Major SCSI disk errors. Disk nearly DNF Major The amount of free space on full a disk is near the specified limit. Ethernet ETH Major An error is detected on an error Ethernet interface device. (continued on next page) 1-8 Table 1-1 (Cont.) Reported Error Events ___________________________________________________________ Alarm Event Code Status Description ___________________________________________________________ Memory error MEM Major Event message shows a number of existing memory errors on the node. Other network OTH Major This reports all transport errors errors. User defined EXT Minor User defined error events error Major defined through ipfm_user_api Critical for IP Fault Manager _______________________________menu._______________________ 1-9 2 _________________________________________________________________ Installing the IP Fault Manager This chapter describes the following IP Fault Manager installation topics: o Software installation guidelines o Release notes o Installation prerequisites and content of distribution kit o Pre-installation information o Product Authorization Keys (PAK) o Software installation of IP Fault Manager and POLYCENTER System Watchdog setup and configuration o Installation hints o De-installing savesets 2-1 2.1 Software Installation Guidelines The following installation guidelines may be helpful when installing the IP Fault Manager software. Additionally, an example post-installation PSW configuration file is listed in at the end of this chapter. o Reading release notes o Installation requirements - Software prerequisites - Checking the software distribution kit o Pre-installation information - Installation time - Privileges required for installation - Disk space required - Backing up your system disk o Installing the IP Fault Manager software o Running the Installation Verification Procedure (IVP) o Installation hints o Post-installation configuration file ________________________ Note ________________________ Depending on the level of Factory Installation Services (FIS) you ordered with your system, some or all of the following installation procedures may already have been performed. ______________________________________________________ Use this document as well as the following documentation: AlphaServer Intelligent Peripheral Platform Owner's Guide AlphaServer Intelligent Peripheral Platform System Manager's Guide Digital UNIX Installation Guide 2-2 2.2 Reading Release Notes The IP Fault Manager software provides online release notes. Digital Equipment Corporation strongly recommends that you read the release notes, because they may contain information about changes to the installation and the use of the product. The release notes are in the following file: usr/opt/IPFM/release.notes 2.3 Installation Requirements The Digital UNIX setld utility is used to install the IP Fault Manager software. The Digital UNIX operating system, Dialogic Drivers for Digital UNIX software, and other required layered software products are installed at the factory. The IP Fault Manager is preconfigured to work with the POLYCENTER System Watchdog software. Additionally, the Dialogic software is preconfigured to work with the telephony and voice hardware options selected by the customer. After installation, set up the IP Fault Manager software configuration file with the correct system type for your location (simplex, duplex, or workstation), type of power supply, and node names and addresses. 2.3.1 Software and Firmware Prerequisites The following software must be installed prior to IP Fault Manager: o Digital UNIX Version 3.2C o AlphaServer 1000 Firmware Revision 3.10 o POLYCENTER System Watchdog Version 2.2 o Dialogic DTP Firmware Version 1.24 See the AlphaServer Intelligent Peripheral Platform System Manager's Guide for additional information. 2-3 2.3.2 Checking the Software Distribution Kit Use the bill of materials (BOM) to check the contents of your IP Fault Manager software distribution kit. In addition to this guide, the software distribution kit includes the following items: o A CD-ROM optical disk o The AlphaServer Intelligent Peripheral Platform System Manager's Guide o This document If your software distribution kit is damaged or incomplete, contact your Digital representative. 2-4 2.4 Pre-Installation Information The following sections describe the information and computing environment that you need before installing the IP Fault Manager software. 2.4.1 Installation Time Installation of IP Fault Manager software should take approximately five (5) minutes. Installation time required for the operating system and required layered products is dependent on each product. 2.4.2 Privileges Required for Installation You must have superuser privileges to install the IP Fault Manager software. 2.4.3 Disk Space Required You need 1000 kilobytes of available disk space to install the IP Fault Manager software. To check total space and free space for the directories where the IP Fault Manager software files will reside, enter the df command. A display similar to the following example is displayed on your screen, showing available free space. This free space must accommodate the subset requirements of the product. Filesystem 512-blocks Used Avail Capacity Mounted on /dev/rz0a 126334 99098 14602 87% / /dev/rz0g 1732204 671832 887150 43% /usr 2.4.4 Backing Up Your System Disk Digital recommends that you back up the system disk before you install any software. Use the backup procedures established at your site. For details on performing a system disk backup, see your Digital UNIX system management documentation. 2-5 The following steps are required to install the IP Fault Manager software on your workstation: 1. Boot and configure the operating system 2. Install the layered product Product Authorization Keys 3. Install the layered product software 2.5 Installing the Product Authorization Keys (PAK) Log on to the system using the root account. Using the lmfsetup Digital UNIX script, enter the PAK number (found on your license) for this product. See your Digital UNIX system administration documentation for information on using the lmfsetup script. 2.6 Installing the IP Fault Manager Software See the AlphaServer Intelligent Peripheral Platform System Manager's Guide for instructions on configuring and booting the AlphaServer Intelligent Peripheral Platform. 2.6.1 Starting the Installation To install the IP Fault Manager layered product software, perform the following steps: 1. You must start in the root directory. # cd / 2. If /mnt is to be the destination directory, check to see that no files are present before you load the IP Fault Manager software. # ls /mnt If the directory is empty, the system displays ./ ../. Alternatively, you can create a new directory as follows: # mkdir /nnn Where nnn is the name of the new directory. 3. Load the IP Fault Manager software CD into the CD-ROM device. 2-6 4. Mount the CD-ROM device using /mnt or the new directory name /nnn as follows: # mount -r /dev/rz5c /mnt Where /dev/rz5c is an example name for a CD-ROM device. The light on the CD-ROM player flashes. The cursor returns to the screen. 5. To list the files and directories on the CD, enter the following command: # ls /-alF /mnt 6. To install the IP Fault Manager software, enter the following command: # setld -l /mnt/IPFM100/output Where the 100 in IPFM100 indicates version V1.0. ________________________ Note ________________________ Always use the 3 digits indicating the current major and minor version of the software kit being installed. Appendix A contains an example installation. ______________________________________________________ To complete the installation, follow the screen instructions: (C) Copyright Digital Equipment Corporation 1996. All Rights Reserved. Depending on system activity and your selection this installation will take approximately 5 minutes. Checking for IP-FAULT-MANAGER License If you are not satisfied with the backup of your system please select option 6 to exit this installation. 2-7 2.6.2 Did You Receive Mount Messages? Message... Comment ___________________________________________________________ Device busy... Mount the device to another directory. Could not find /dev Select an alternate device such as /rz5c /dev/rz5c. Type ls /dev/rz* to list the available devices. __________________________________________________________ 2.6.3 Ending the Installation At the conclusion of the IP Fault Manager software installation, the following information is displayed on your screen: Creating softlinks... The Alpha IP IPFMAlarm Software has been successfully installed. 1. Return to the root directory, and enter the following commands: # cd / # umount /mnt 2. When the cursor returns to the screen, remove the CD from the CD-ROM device. 2-8 2.6.4 POLYCENTER System Watchdog Setup To setup POLYCENTER System Watchdog, enter the following information at the console prompt: # psw_config_editor -n # psw$edit> exit # cd /usr/opt/IPFM # ./ipfmalarm_watchdog_setup The system displays the following lines (a simplex system is used in the example): # IPFMalarm Duplex System? Please enter y/(n): n If you are configuring a duplex system, two additional lines are displayed: # Host 1 of duplex system is {hostname}: # Enter Host 2: To complete the kernel build, enter the following at the console prompt: # doconfig -c ASIPP2 # cp /sys/ASIPP2/vmunix /vmunix # shutdown -r now Where ASIPP2 is replaced with your system node name. 2.6.5 POLYCENTER System Watchdog Configuration The POLYCENTER System Watchdog (PSW) software must be configured to display alarm messages from the IP Fault Manager alarm board. For example, each node (AlphaServer 1000 system) must configured to receive alarms and to report system events. The following documentation will assist you in customizing the PSW configuration file: o POLYCENTER System Watchdog Agent and Consolidator for Digital UNIX. o AlphaServer Intelligent Peripheral Platform System Manager's Guide 2-9 o Appendix A provides an example installation script. The following is an example of a psw_consolidator.conf file: ############################################################################### # psw_consolidator.conf ############################################################################### # Copyright (c) Digital Equipment Corporation, 1991. All Rights Reserved. # # Unpublished rights reserved under the copyright laws of the United States. # # # # The software contained on this media is proprietary to and embodies the # # confidential technology of Digital Equipment Corporation. Possession, use,# # duplication or dissemination of the software and media is authorized only # # pursuant to a valid written license from Digital Equipment Corporation. # # # # RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the U.S. # # Government is subject to restrictions as set forth in Subparagraph # # (c)(1)(ii) of DFARS 252.227-7013, or in FAR 52.227-19, as applicable. # ############################################################################### # consolidator configuration file # # consolidator configuration grammar: # ----------------------------- # polling_interval # [class_description]* # [node_description]* # [node_events_description]* # # where # class descriptions are: # class:class_name:[class_ref][(+|-)event_string]*: # node descriptions are: # node:node_name:[class_ref]:[tcp|dnet|notrans]: # node events descriptions are: # node_name:event_string:parameters # # - "node", "class" are reserved words # - refer to the user manual or do man psw_consolidator.conf # for description of parameters # comment lines (beginning with a '#' in the first column can be # inserted anywhere in the file # warning: no blank line. # trailing comments can be added once mandatory fields have been mentioned 2-10 # (beginning with a ":"). # # ============================================================================= # POLLING INTERVAL DESCRIPTION BLOCK (in seconds - default 60) 30 # ============================================================================ # CLASSES DESCRIPTION BLOCK: A CHECKED EVENTS LIST BY CLASS # DQP, BQP are time consuming on non Ultrix agents # PSW on Ultrix agents (psw_sensor_fly_custom) class:IPFMalarm:+EXT+CPU+MEM+DSK+ETH+DNF+PRO+OTH+PSW # # ============================================================================= # NODES DESCRIPTION BLOCK: A CLASS AND TRANSPORT BY NODE # examples: #node:nodex:cl_ultrix:tcp #node:nodev:cl_vms:dnet # # NODES ENTERED DURING POST INSTALLATION (DO NOT REMOVE - USED BY INSTALLATION) # node:asipp7:IPFMalarm:tcp node:asipp1:IPFMalarm:tcp # ============================================================================= # DATA SPECIFICATION FOR SPECIAL EVENTS - DESCRIPTION BLOCK # example: #nodex:DNF:/usr/users:200000 #nodex:PRO:inetd:144 #nodev:BAT:bat1:queue1:system #nodev:ILL:120 #nodev:SHS:shadow1 #nodex:PRS:my_ansi asipp7:DNF:/::10:0 asipp7:DNF:/usr::10:0 asipp7:PRO:ipfmalarm_timer:0:0 asipp7:PRO:ipfm_monitor_mc:0:0 asipp7:PRO:ipfmdisk_timer:0:0 asipp1:PRO:ipfmalarm_timer:0:0 ***************************************************************************** 2-11 2.6.6 Running the Installation Verification Procedure (IVP) To run the IVP for a subset, enter the following information at the command prompt: # setld -v subset_name Where subset_name is the name of the subset (Alarm, Menu or Drv) to be verified. The following three examples, one for each subset, illustrates the installation verification procedure. IVP Example 1: IP Fault Manager Alarm Software Files # setld -v IPFMALARM100 IPFMALARM - IPFMAlarm Software Files (IPFMALARM100) 0 verification errors encountered. 0 corrections performed. IVP, information: IPFMALARM100 is properly installed. IVP Example 2: IP Fault Manager Alarm Menu Files # setld -v IPFMMENU100 IPFMMENU - IPFMAlarm Menu Files (IPFMMENU100) 0 verification errors encountered. 0 corrections performed. IVP, information: IPFMMENU100 is properly installed. IVP Example 3: IP Alarm Driver Files # setld -v IPFMDRV100 IPFMDRV - IPFMAlarm SRV_MGT Driver Files (IPFMDRV100) 0 verification errors encountered. 0 corrections performed. IVP, information: IPFMDRV100 is properly installed. If an installation fails the Installation Verification Procedure (IVP), the executable files contained in usr/opt /IPFM will be removed automatically. 2-12 2.6.7 Installation Hints __________________________________________________________ If you want to .... Comments __________________________________________________________ Exit the installation Because you are not satisfied with the backup of your system, select item 6 in the installation procedure to stop the installation. Appendix A contains an example installation listing. Stop the installation Select item 6 EXIT without installing any subsets, in the installation procedure to stop the installation. Configure the IP See the AlphaServer Intelligent Fault Manager Peripheral Platform System Manager's Guide for information. Verify the See information on the IVP in installation section Section 2.6.6 of this guide. Rebuild the kernel See Digital UNIX system management documentation. Install optional See AlphaServer Intelligent products Peripheral Platform System Manager's Guide. 2.6.8 Deinstalling Savesets If the installation fails, and the subsets are not completely installed, follow these steps: 1. For the IPFMALARM100 subset: Type #setld -d IPFMALARM100 and press . The following information is displayed on your screen: Removing softlinks ... Deleting "IPFMALARM - IPFMAlarm Software Files" (IPFMALARM100). 2. For the IPFMDRV100 subset: Type #setld -d IPFMDRV100 and press . The following information is displayed on your screen: Deleting "IPFMDRV - IPFMAlarm SRV_MGT Driver Files" (IPFMDRV100). 2-13 3. For the IPFMMENU100 subset: Type #setld -d IPFMMENU100 and press . The following information is displayed on your screen: Deleting "IPFMMENU - IPFMAlarm Menu Files" (IPFMMENU100). 2-14 3 _________________________________________________________________ Understanding IP Fault Manager This chapter provides a descriptive overview of the following IP Fault Manager components: o Alarm event o User programming interface o Alarm panel o Maintenance center monitor o Timer reset process o AlphaServer 1000 server management registers o Event log interface o PSW external event log file o Disk monitoring The IP Fault Manager works together with the POLYCENTER System Watchdog (PSW) products to provide the basic services consisting of event monitoring and management of the AlphaServer Intelligent Peripheral Platform system. 3-1 Figure 3-1 illustrates the IP Fault Manager architecture. 3.1 IP Fault Manager Alarm Event The IP Fault Manager alarm event function turns on or off the individual LED's on the IP Fault Manager front panel and the audible alarm. This function is used by the POLYCENTER System Watchdog action routine. The IP alarm event function communicates with the alarm board as follows: o The IP Fault Manager software configures the communications serial port at a baud rate of 1200, 8 data bits, 1 stop bit, no parity, full duplex. o Messages are sent in ASCII format, terminate with a carriage return and line feed, and must be eighty (80) characters or less in length. o Alarm board messages begin with the letters A DTP. For additional information, refer to the Dialogic Telco Platform User's Reference. o Messages that do not begin with the letters A DTP are forwarded directly to the IP maintenance center monitor. The IP Fault Manager maintenance center monitor (IPFM_ monitor_mc) is a Digital UNIX process that polls the RS-232 maintenance center serial port on the IP Fault Manager alarm panel chassis, for alarm events detected by the IP Fault Manager alarm board. See Section 3.4 for additional information. 3-2 3.2 IP Fault Manager Programming Interface The IP alarm event programming interface contains all the software needed to build the alarm messages for user written applications. It communicates with the alarm board using the communications processes specified in Section 3.1, and communicates with the POLYCENTER System Watchdog through the psw_external process. See Chapter 5 for additional information. 3.3 IP Fault Manager Alarm Panel ________________________ Note ________________________ The Dialogic Telco Platform (DTP) alarm board is referred to as the IP Fault Manager alarm panel throughout this document. ______________________________________________________ 3.3.1 Alarm Panel Description The IP Fault Manager alarm panel is responsible for: o Monitoring the IP Fault Manager alarm panel chassis temperature, power supply input and output voltages, battery charge, and the alarm board circuitry. o Communicating with the alarm process and maintenance center monitor via messages. o Transferring messages between the alarm board and the IP maintenance center monitor. o Activating audible and visual alarms in the event of system failures. The alarm board is connected to the power supply, control buttons, communications serial port, maintenance center serial port, front panel switches, battery, and speakers by means of an interface board. The interface board is mounted on the base of the chassis directly behind the alarm panel. The alarm board communicates with the: o IP Fault Manager through the communication serial port in the rear of the chassis. 3-3 o IP maintenance center monitor through the maintenance center serial port in the rear of the chassis. When the alarm board detects an alarm event other than alarm board failure, it sends a message to the IP maintenance center monitor via the maintenance center serial port that identifies the alarm status and describes the alarm condition. An alarm board failure is a MAJOR event, and the major LED on the IP Fault Manager alarm board is illuminated. POLYCENTER System Watchdog logs the event in the alarm event log file /usr/opt/IPFM/IPFMpanel.log, and sends the event message to the workstation consolidator. See Chapter 5 for a list of the alarm board commands. 3-4 3.3.2 Alarm Notification The software that resides on the alarm board stores alarm messages and handles alarm notification. The alarm board has nine LED programmable indicators visible through the front panel of the chassis. Only one LED indicator is illuminated on the alarm panel at one time. Each indicator can be accessed through the menu interface described in Chapter 4. The LEDs provide the following information: o System Status Indicates one of four operating modes for the IP Fault Manager alarm panel. The system status indicators do not represent the status of the IP Fault Manager platform as a whole, only the alarm panel. ==> Active IP Fault Manager alarm panel is active and functioning. ==> Out of Service The IP Fault Manager alarm panel is not functioning and triggers a critical alarm condition. ==> Standby Not used as part of the base platform, but it is available through the programming interface. => Unavailable Status of a troubled alarm panel, or a system under repair. If the alarm board detects an alarm condition while the alarm board system status is unavailable, it will not activate the appropriate alarm status LED or audible alarm, until the system status returns to active. o Alarm Status Indicates one of three alarm priorities: ==> Critical A severe condition that influences the performance of the IP platform. A major alarm requires immediate attention, as it may affect the performance of the IP Fault Manager platform itself. 3-5 ==>Major A serious disruption of service or malfunction of important circuits; it requires immediate attention. ==> Minor An alarm condition that does not influence the performance of the IP Fault Manager platform, but should be investigated. Only one alarm status LED is illuminated at a time. If more than one alarm event exists, priority is give to critical, then major, then minor alarm. 3-6 o Power Supply Status Indicates a possible problem (IP Fault Manager alarm panel only) with one of the following: ==> Fuse ==> Output Refer to the AlphaServer Intelligent Peripheral Platform Owner's Guide and Dialogic Telco Platform User's Reference for additional information. 3.3.3 Activating Alarm Cut-Off If alarms are present, activating alarm cut-off either by pressing the button on the alarm panel, or by selecting (6) on the IPFM Alarm Manager Main Menu, results in the following: o Alarm cut-off LED illuminates o Alarm speaker turns off o Active alarm status LED turns off o Active system status LED remains illuminated o Enabled remote outputs turn off When the acknowledged alarm condition clears, the following occurs: o Alarm cut-off LED turns off o System status LED returns to the original state (active or standby) o Message is sent to the maintenance center indicating that the alarm condition has cleared. 3.3.4 Activating Alarm Reset Activating Reset, by pressing the button on the alarm panel, causes the alarm board to run diagnostic checks on the alarm board processor, serial ports, and LEDs. When the alarm board self-test begins, the Unavailable system status indicator on the alarm panel blinks three times, then the alarm board software 3-7 illuminates each LED in sequence. The alarm Cut-Off LED remains illuminated until the self-test is complete. 3.3.5 Alarm Board Acknowledgments The alarm board acknowledges all commands by sending a DTP-XXX: ACK message. If there is no acknowledgment, the command is retried a maximum two times. If there is no acknowledgment on the retry, then an alarm board failure can be assumed. This failure is a Critical event, the CRITICAL LED on the IP alarm panel is illuminated and the event is passed to POLYCENTER System Watchdog. POLYCENTER System Watchdog logs the event in the alarm event log file /usr/opt/IPFM/IPFMpanel.log and sends a Host PC (to alarm board) Communication Failed message to the PSW display and the host system console. 3-8 Alarm board acknowledgments take the following form: DTP XXX: ACK Where XXX can be one of the following: Table 3-1 Alarm Board Acknowledgments ___________________________________________________________ Acknowledgment Type Description ___________________________________________________________ DTP ALARM SET: ACK Alarm-set message acknowledged DTP ALARM CLR: ACK Alarm-clear message acknowledged DTP SYSTEM: ACK System status message acknowledged DTP REQ: ALARM- Alarm conditions acknowledged ACKNOWLEDGED Table 3-2 lists alarm board-detected alarm events. 3-9 Table 3-2 Alarm Board Detected Alarm Events __________________________________________________________ Alarm Alarm Events Type __________________________________________________________ C-ALARM-SET: BATTERY POWER FAILURE Critical C-ALARM-CLR: BATTERY POWER RESTORED C-ALARM-SET: FUSE (INPUT VOLTAGE) FAILED Critical C-ALARM-CLR: FUSE (INPUT VOLTAGE) RETURNED TO NORMAL C-ALARM-SET: HOST PC (TO ALARM BOARD) Critical COMMUNICATION FAILED C-ALARM-CLR: COMMUNICATIONS RESTORED C-ALARM-SET: OUTPUT VOLTAGE BELOW NORMAL Critical C-ALARM-CLR: VOLTAGE RETURNED (INCREASED) TO NORMAL C-ALARM-SET: OUTPUT VOLTAGE ABOVE NORMAL Critical C-ALARM-CLR: VOLTAGE RETURNED (DECREASED) TO NORMAL ALARM-SET: TEMPERATURE ABOVE NORMAL Major ALARM-CLR: TEMPERATURE RETURNED TO NORMAL ALARM-SET: BATTERY NOT PRESENT/FAILED Minor ALARM-CLR: BATTERY OPERATIONAL ALARM-SET: INSUFFICIENT BATTERY CHARGE Minor ALARM-CLR: BATTERY RECHARGED ___________________________________________________________ Refer to Section 3.1 for more information on the IP Fault Manager alarm event data structures. 3-10 3.4 IP Fault Manager Maintenance Center Monitor The IP Fault Manager maintenance center monitor (IPFM_ monitor_mc) is a Digital UNIX process that polls the RS- 232 maintenance center serial port on the DTP chassis, for alarm events detected by the IP Fault Manager alarm board. If an alarm event is detected, the event is passed to the POLYCENTER System Watchdog. The IP maintenance center monitor filters out any messages it receives that are not either a SET ALARM or a CLEAR ALARM message. When a SET ALARM or a CLEAR ALARM message is detected, the alarm board is responsible for illuminating or clearing the appropriate LED's on the IP Fault Manager alarm panel. The maintenance center creates a message that contains the alarm event information and sends this message to POLYCENTER System Watchdog. POLYCENTER System Watchdog logs the event in the alarm event log file /usr/opt/IPFM/IPFMpanel.log, and sends a message to the POLYCENTER System Watchdog on the workstation console, which displays the event. 3.5 IP Fault Manager Timer Reset Process This Digital UNIX software process sends timer reset messages to the alarm board at least once every 30 seconds or less, starting four minutes after power-up. The messages indicate that the alarm board and the alarm event process is functioning properly. The timer reset process communicates with the alarm board through the communications serial port in the rear of the IP Fault Manager alarm panel chassis, according to the alarm board communications data structures described in Section 3.3. 3-11 3.6 AlphaServer 1000 Server Management Registers The AlphaServer 1000 system monitors the server management register; the server management register is a read/write register that provides status information on the power supplies, as well as functioning as an optional mechanism for shutting off the DC power in the system. The PCI Interrupt Register provides temperature warning status information. When this bit is set, it indicates that the internal box temperature has exceeded the warning level threshold. These registers are monitored, but do not cause an alarmed event at this time. 3-12 3.7 IP Fault Manager Event Log The IP Fault Manager IPFMpanel.log file found in /usr/opt/IPFM/IPFMpanel.log contains the following information: o Errors detected by the IP Fault Manager alarm software - Open device errors - All get/set terminal attributes - All get lock device errors - Exceeded get lock retry count - No match on timer acknowledgments - All read/write time-outs to the IP alarm panel o Status related to the system and alarm events - Invalid alarm message - Errors encountered during license check - Actions taken on the alarm panel, for example, the SET and CLEAR commands. The follow example shows the event logfile output when a MAJOR alarm with the optional message descriptor >> DTP now at V. 1.24 is set: Fri Mar 8 11:23:45 EST 1996 Message successfully added SET ACTIVE MAJOR >> DTP now at V. 1.24 Fri Mar 8 11:23:54 EST 1996 SET SYSTEM STATUS ACTIVE Fri Mar 8 11:23:58 EST 1996 SET MAJOR ALARM When the same MAJOR alarm is cleared, the IPFMpanel.log file displays the following information: 3-13 Fri Mar 8 12:31:14 EST 1996 Message(s) successfully removed CLEAR ACTIVE MAJOR >> DTP now at V. 1.24 Fri Mar 8 12:31:22 EST 1996 SET SYSTEM STATUS ACTIVE Fri Mar 8 12:31:24 EST 1996 CLEAR MAJOR ALARM 3.8 PSW External Event Log The POLYCENTER System Watchdog file /usr/opt/IPFM/IPFMalarm_psw_api.log, contains the last POLYCENTER System Watchdog command and result. 3-14 3.9 IP Fault Manager Disk Monitoring SCSI disks configured on the IP systems can be monitored by the IP Fault Manager alarm management software for availability. Disks must be configured into the Logical Storage Manager (LSM) in order to be monitored. When a disk is added to the disk monitoring software, it probes every cycle to see if the disk can be accessed. If the probe is not successful, a MAJOR alarm is generated, and the disk alarm is sent to the POLYCENTER System Watchdog display screen (see Figure 1-1). If you are configuring a duplex IP system that will use DECsafe and the Logical Storage Manager (LSM) to control the shared SCSI disks in the BA35x StorageWorks array, the shared disk will only be accessible by one member of the cluster at a time. In this case, the disk monitoring software will detect if the disks in the BA35x StorageWorks array are accessible by the system, and will probe only the disks that are available. If DECsafe has the StorageWorks array mounted on another system in the cluster, those disks are not probed. If a disk fails the probe access, a disk alarm is generated. This alarm is sent to the POLYCENTER System Watchdog display, as well as the IP Fault Manager alarm panel. When the failed disk is replaced, the MAJOR disk alarm will not automatically go away. The disk alarm must be manually cleared by selecting 4 on the IPFM Alarm Manager Main Menu (see Figure 4-1). In addition to disk monitoring, POLYCENTER System Watchdog can be configured to monitor specific filesystems to detect if the diskspace in use exceeds a defined highwater threshold. POLYCENTER System Watchdog causes a MINOR alarm to be generated with a Disk Nearly Full (DNF) event. To add or remove the available disk space monitoring (as performed by POLYCENTER System Watchdog), refer to the POLYCENTER System Watchdog documentation. 3-15 4 _________________________________________________________________ IP Fault Manager Alarm Utility Operator Interface This chapter provides an overview of the IP Fault Manager operator interface. It describes the IP Fault Manager alarm utility menus and submenus used by an operator for command input and acknowledgment. 4.1 Overview The IP Fault Manager Alarm Utility is responsible for providing the IP Fault Manager alarm board with information about which system status and alarm status LEDs are to be set or cleared. Additionally, the utility displays all user requests regarding the IP alarm board information on the workstation console. Optional messages may be included with a IP Fault Manager alarm event message. The IP Fault Manager Alarm Utility permits the user to: o Set and clear all the LED's on the IP Fault Manager front panel o Request information regarding the IP Fault Manager alarm panel o Include an optional, descriptive message with the IP Fault Manager alarm event message The IP Fault Manager Alarm Utility is responsible for: o Informing the IP Fault Manager alarm board which system status and alarm status LEDs to set and/or clear o Displaying on the workstation console all user requests regarding IP Fault Manager alarm panel information o Including optional, descriptive messages in the IP Fault Manager alarm event message 4-1 4.1.1 Accessing the Menu Interface Before you start the menu interface, set your terminal screen for 24 rows as follows: # stty rows 24 To access the operator interface, enter the following command at the UNIX command prompt: # usr/opt/IPFM/ipfmalarm_menu 4.1.2 Using the Main Menu The IPFM Alarm Manager Main Menu is shown in (Figure 4-1). The following submenus and screen displays are accessed from the main menu: ___________________________________________________________ Selection Description Menu/Display ___________________________________________________________ 1 Set/Clear DTP Alarm IPFM Alarm Menu (Figure 4-2) 2 Set/Clear System System Status Menu Status (Figure 4-5) 3 Request Information Alarm and Status Information (Figure 4-6) 4 Clear a System IPFM Alarm Manager Main Menu Watchdog Event (Figure 4-7) 5 Add/Remove Disk Disk Monitor Menu Monitor (Figure 4-8) 6 Perform DTP Cut-Off - 7 Exit - 4-2 4.2 Set or Clear Alarm To set or clear a CRITICAL, MAJOR, or MINOR alarm on the IP alarm panel, enter the number 1 in the Enter Selection: field of the IPFM Alarm Manager Main Menu and press . The IPFM Alarm Menu (see Figure 4-2) is displayed. Enter a number (1 through 6) in the Enter Selection: field and press . You are prompted for optional text to describe the event. An appropriate LED on the IP alarm panel is set or cleared, and an event is added or deleted from the POLYCENTER System Watchdog screen. Enter the number 7 in the Enter Selection: field and press . You are returned to the IPFM Alarm Manager Main Menu. 4-3 If you enter 1, 2 or 3 (to set an alarm) in the Enter Selection: field, you are prompted to enter the alarm message text to be displayed on the POLYCENTER System Watchdog screen. See Figure 4-3. You must enter text that is unique and descriptive of the event type. Legal characters include letters, numbers and spaces. Punctuation is not recognized. You can use a maximum of 32 characters in a single description. 4-4 If you enter 4, 5 or 6 (to clear an alarm) in the Enter Selection: field, you are prompted to enter the alarm message text and press . You must enter only the text that was supplied manually, for example: Feb 29 17:51 ASIPP1 DTP ACTIVE CRITICAL test message Where: test message was provided by the user. Otherwise, a No matching text found message is displayed on the screen. See Figure 4-4. The message is only displayed if there is mismatch in text between what you enter and the information in the psw_ external file. The text must match exactly. 4-5 4.3 Set or Clear System Status To set or clear the system status, enter the number 2 in the Enter Selection: field of the IPFM Alarm Manager Main Menu and press . The System Status Menu (Figure 4-5) is displayed. Enter a number (1 through 4) in the Enter Selection: field and press . An appropriate LED on the IP alarm panel is illuminated. ________________________ Note ________________________ The system status light refers only to the status on the IP Fault Manager alarm panel. ______________________________________________________ Only one alarm panel System Status light can be illuminated at any given time. For example, both the alarm board and IP Fault Manager can set the System Status to Active, or to Out of Service, or to Standby. Refer to the Dialogic Fault Resilient Telco Platform User's Reference for detailed information about system status. Enter the number 5 in the Enter Selection: field and press . You are returned to the IPFM Alarm Manager Main Menu. 4-6 4.4 Remote Access to Real Time Information To inquire as to which lights are illuminated, without being in the same area as the IP Fault Manager alarm panel, follow this step: 1. To display the system time, IP Fault Manager version, system state, real-time system status and alarm status, enter the number 3 (Request Information) in the Enter Selection: field of IPFM Alarm Manager Main Menu and press . Figure 4-6 is an example of the information displayed on your screen. Where: o System Time displays the current date and time o DTP Version displays the version number of the DTP chassis o System State displays any acknowledged alarm events o System Status displays the current system status o Alarm Status displays the current alarm status, that is, the highest level alarm currently active 4-7 4.5 Clear a System Watchdog Event To clear a POLYCENTER System Watchdog event, enter the number 4 in the Enter Selection: field of IPFM Alarm Manager Main Menu and press . You are prompted (Figure 4-7) to enter the the event to be cleared. Enter the POLYCENTER System Watchdog string beginning with the DTP, as shown in the following example. Feb 29 18:00 ASIPP1 DTP ACTIVE CRITICAL test message Remember to to enter the string starting with the DTP: DTP ACTIVE CRITICAL test message 4-8 4.6 Add or Remove Disk Monitoring To add or remove one or more disks from being monitored, enter the number 5 in the Enter Selection: field of the IPFM Alarm Manager Main Menu and press . The Disk Monitor Menu (Figure 4-8) is displayed. 4-9 To add, remove or show disks being monitored, follow these steps: 1. To add a disk to be monitored, enter the number 1 in the Enter Selection: field at the bottom of the Disk Monitoring Menu and press . You are prompted for a disk name, for example rrz18c, see Figure 4-9. ________________________ Note ________________________ A disk must first be mounted using the Digital Logical Storage Manager (LSM) software, before you specify monitoring of the disk. ______________________________________________________ 4-10 2. To remove a disk from being monitored, enter the number 2 in the Enter Selection: field at the bottom of the Disk Monitor Menu and press . You are prompted for the name of the disk to be removed, see Figure 4-10 for information. ________________________ Note ________________________ All users must have Digital UNIX system privileges in order to remove a disk from being monitored. ______________________________________________________ 4-11 3. To show all disks being monitored, enter the number 3 in the Enter Selection: field at the bottom of the Disk Monitor submenu and press . A list of the disks (Figure 4-11) being monitored is displayed on the screen. ________________________ Note ________________________ Figure 4-11 lists the disks being monitored on a simplex system. ______________________________________________________ 4. To exit the menu, enter the number 4 in the Enter Selection: field at the bottom of the Disk Monitor Menu and press . Duplex System Information In a duplex system, the Logical Storage Manager (LSM) indicates the disks currently attached to the system. The information is only available through LSM. Refer to the Logical Storage Manager System Administrator's Guide for additional information. If a monitored disk on the current system fails over to the other system, then the current system no longer flags disk errors on the current system. The disks are now owned by the other system. 4-12 4.7 Perform DTP Cut-Off An alarm cut-off is used to turn off all currently active alarm LEDs on the IP Fault Manager alarm panel, as well as all audible beeps. To send a CUT-OFF command to the IP alarm panel, enter the number 6 in the Enter Selection: field of IPFM Alarm Manager Main Menu and press . See the Dialogic Telco Platform User's Reference for information on the CUT-OFF and RESET commands. 4.8 Exit Alarm Manager To exit the IP Fault Manager Alarm Utility and return to the UNIX prompt, enter the number 7 in the Enter Selection: field of IPFM Alarm Manager Main Menu and press . 4.9 Dialogic Telco Platform Troubleshooting Refer to the Dialogic Telco Platform User's Reference guide for information on troubleshooting DTP alarm conditions. 4-13 5 _________________________________________________________________ IP Fault Manager Programming Interface This chapter describes the programming components in the IP Fault Manager user interface: o Alarm event function o User API 5.1 Alarm Event The IP Fault Manager alarm event function contains all the software needed to turn on or off the individual LED on the IP Fault Manager front panel and the audible alarm. The Digital UNIX ipfm_alarm function is linked with the psw_ action routine that runs on the AlphaServer 1000 system. Valid alarm board commands are defined in the /usr/opt/IPFM /ipfm_alarm.h file. See Section 5.2.2 for information. 5.2 User API The IP Fault Manager User API contains all the software needed to build the IP alarm panel messages. It communicates with the alarm board by using the communications processes specified in alarm_event, and communicates with the POLYCENTER System Watchdog through the psw_external process. The Digital UNIX ipfm_user_api is a linkable function that resides on the AlphaServer 1000 system. The alarm event functions can be linked into any user written application. 5-1 5.2.1 Request Commands All alarm messages (except for IP alarm panel REQUEST commands) are sent to the POLYCENTER System Watchdog psw_ external process to: o Display the alarm event message on the workstation console. o Log the alarm event in the alarm event log file. o Execute the IP alarm panel command to manipulate front panel lights. All REQUEST commands are sent directly from the IP Alarm Event User API to the IP Fault Manager alarm board. POLYCENTER System Watchdog is not used for the REQUEST commands. 5-2 In addition to the valid alarm commands, the User API allows the user to request information regarding the DTP Information requests that the user can send to the alarm board are as follows: Request from the user Response from the alarm to the alarm panel panel to the user request --------------------- ----------------------- A DTP-REQ: SYSTEM DTP-REQ: SYSTEM=ACTIVE DTP-REQ: SYSTEM=OUT OF SERVICE DTP-REQ: SYSTEM=STANDBY DTP-REQ: SYSTEM=UNAVAILABLE A DTP-REQ: ALARM DTP-REQ: ALARM=CRITICAL DTP-REQ: ALARM=MAJOR DTP-REQ: ALARM=MINOR DTP-REQ: ALARM=NONE DTP-REQ: ALARM=ACKNOWLEDGED A DTP-REQ: VERSION DTP-REQ: VERSION= A DTP-REQ: STATE DTP-REQ: STATE= alarm events Where: DTP-REQ Identifies message requests sent to the IP Fault Manager alarm panel. SYSTEM Requests status of system. Is the system active, out of service, on standby, or unavailable? ALARM Requests alarm status; critical, major or minor alarm, no alarm or an acknowledged alarm. VERSION Requests the version number of the DTP firmware. STATE Requests alarmed events and active conditions. 5-3 5.2.2 User Input Commands Valid IP Fault Manager alarm board commands are defined in the /usr/opt/IPFM/ipfm_alarm.h file. A valid IP Fault Manager alarm board command contains a maximum of 80 bytes. The Alarm Event User API function accepts as input a valid DTP alarm event argument, for example: o An address of a buffer to return requested information regarding the DTP chassis (for DTP alarm board request commands only) o An optional message to be included within the IP Fault Manager alarm board SET and CLEAR commands. The User API accepts the following as input commands: 5-4 Table 5-1 User API Commands ___________________________________________________________ Command Description ___________________________________________________________ SET_CRITICAL_ALARM Illuminates the CRITICAL LED on the IP alarm panel CLR_CRITICAL_ALARM Turns off the CRITICAL LED on the IP alarm panel SET_MAJOR_ALARM Illuminates the MAJOR LED on the IP alarm panel CLR_MAJOR_ALARM Turns off the MAJOR LED on the IP alarm panel SET_MINOR_ALARM Illuminates the MINOR LED on the IP alarm Panel CLR_MINOR_ALARM Turns off the MINOR LED on the IP alarm panel SET_ACTIVE_SYSTEM No alarm condition present SET_STANDBY_SYSTEM Available through programming interface only SET_OUT_OF_SERVICE_ Illuminates the CRITICAL LED on the SYSTEM IP alarm panel. SET_UNAVAILABLE_ No LED illuminates or audible alarm SYSTEM until the system status returns to active. REQ_STATUS Requests system status, is it active out of service, on standby or unavailable REQ_STATE Request alarmed events and active conditions REQ_ALARM Requests alarm status; critical, major or minor alarm, no alarm or an acknowledged alarm present REQ_VERSION Requests the version number of the IP Fault Manager alarm panel CUT_OFF Illuminates the CUTOFF LED, and turns off the Alarm Status LED 5-5 5.2.2.1 SET_CRITICAL_ALARM Command Example Commands take the format shown in the following SET_ CRITICAL_ALARM command. Description The SET_CRITICAL_ALARM command illuminates the CRITICAL LED on the IP alarm panel. Format ipfm_user_api (command, *str) Where: Command is one of the valid alarm commands. *str0 is a pointer to an optional character string or null. Description The optional message to be included in the IP Fault Manager alarm board SET or CLEAR command can be 32 characters or less in a single description. You must enter text that is unique and descriptive of the event type. Legal characters include letters, numbers and spaces. Punctuation is not recognized. Examples In the following examples, the alarm status is first set to CRITICAL and then set to CLEAR. Note that the descriptor field in each example is filled with >> DTP now at V. 1.24. Setting a Critical Alarm #include #include "ipfm_alarm.h" main() { int stat = 0; int alarm_msg_num = 1; char buffer_address[80] = ">> DTP now at V. 1.24"; 5-6 stat = ipfm_user_api(SET_CRITICAL_ALARM, buffer_address); if (stat != 0) printf("\n unsuccessful"); else printf("\n successful"); 5-7 Clearing a Critical Alarm #include "ipfm_alarm.h" #include main() { int stat = 0; char buffer_address[80] = ">> DTP now at V. 1.24"; stat = ipfm_user_api(CLR_CRITICAL_ALARM, buffer_address); if (stat != 0) printf("\n unsuccessful"); else printf("\n successful"); Refer to the information in section 3.7. Note the resulting IPFMpanel.log file output when a MAJOR alarm with this same descriptor is set and cleared. Returns Table 5-2 lists the return values defined in the include file /usr/opt/IPFM/ipfl_alarm.h. 5-8 Table 5-2 Messages Returned by the ipfm_alarm.h Function ___________________________________________________________ Return Value Description ___________________________________________________________ IPFM_SUCCESS Successful completion of function IPFM_LOCK_RETRY_ The process exceeded the retry EXCEEDED count for obtaining a tty lock. IPFM_BYTE_NUM_INCONSIS The number of bytes returned from the DTP command do not match the number bytes expected. IPFM_RD_WRT_DTP_TIMO Time-out on the read or write to the IP Fault Manager alarm panel. IPFM_NO_TTY_ATTB The function was unable to access tty attributes. IPFM_CANT_OPEN_TTY The function was unable to open the tty device; another process has the tty device open. IPFM_CANT_OPEN_LOGF Unable to open log file. IPFM_INV_ALARM_MSG Invalid alarm_msg value; the value is <=1, or the value is >=23. IPFM_INV_MSG_BUF alarm_msg_buff is less than 40 bytes. IPFM_NO_TEXT_MATCH No matching alarm text found. No alarm is presently set with the severity and the text descriptor that was used in an attempt to clear the IP Fault Manager alarm. 5-9 5.2.3 Command Output The output from valid command input is the address of an 80-byte buffer that returns the requested information. This is optional and only for valid IP Fault Manager alarm board request commands. The following is an example of a user-written application calling the ipfm_user_api function: call IPFM_USER_API (SET_CRITICAL_ALARM, buffer_address) Where: buffer_address is the address of a 32-byte optional message (buffer address for request commands only). The optional message contains the alarm information specified by the application. 5-10 A _________________________________________________________________ IP Fault Manager Sample Installation Script This appendix provides an example installation script for the IP Fault Manager for Digital UNIX software. # setld -l /usr/mnt/IPFM100/output Copyright (c) Digital Equipment Corporation, 1996 Depending on system activity and your selection this installation will take approximately 5 minutes Checking for IP-FAULT-MANAGER License If you are not satisfied with the backup of your system please select option 6 to exit this installation This software is proprietary to and embodies the the confidential" technology of Digital Equipment Corporation. Possession, use or copying" of this software and media is authorized only pursuant to a valid written" license from Digital or an authorized sublicensor Checking for IP-FAULT-MANAGER License *** Enter subset selections *** The following subsets are mandatory and will be installed automatically unless you choose to exit without installing any subsets: * IPFMDRV - IPFMAlarm SRV_MGT Driver Files The subsets listed below are optional: There may be more optional subsets than can be presented on a single screen. If this is the case, you can choose subsets screen by screen or all at once on the last screen. All of the choices you make will be collected for your confirmation before any subsets are installed. 1) IPFMALARM - IPFMAlarm Software Files 2) IPFMMENU - IPFMAlarm Menu Files --- MORE TO FOLLOW --- Enter your choices or press RETURN to display the next screen. A-1 Choices (for example, 1 2 4-6): Or you may choose one of the following options: 3) ALL mandatory and all optional subsets 4) MANDATORY subsets only 5) CANCEL selections and redisplay menus 6) EXIT without installing any subsets Enter your choices or press RETURN to redisplay menus. Choices (for example, 1 2 4-6): 3 You are installing the following mandatory subsets: IPFMDRV - IPFMAlarm SRV_MGT Driver Files You are installing the following optional subsets: IPFMALARM - IPFMAlarm Software Files IPFMMENU - IPFMAlarm Menu Files Is this correct? (y/n): y Checking file system space required to install selected subsets: File system space checked OK. IPFMDRV - IPFMAlarm SRV_MGT Driver Files Copying from /usr/kits/IPFM/output (disk) Verifying IPFMMENU - IPFMAlarm Menu Files Copying from /usr/kits/IPFM/output (disk) Verifying IPFMALARM - IPFMAlarm Software Files Copying from /usr/kits/IPFM/output (disk) Verifying Configuring "IPFMDRV - IPFMAlarm SRV_MGT Driver Files" (IPFMDRV100) Configuring "IPFMMENU - IPFMAlarm Menu Files" (IPFMMENU100) Configuring "IPFMALARM - IPFMAlarm Software Files" (IPFMALARM100) Creating softlinks... The Alpha IP IPFMAlarm Software has been successfully installed. A-2 B _________________________________________________________________ IP Fault Manager Files Installed on Your System This appendix provides a list of the IP Fault Manager files installed on your system. Table B-1 provides and example of the pathnames and filenames associated with the IP Fault Manager for Digital UNIX after installation. Table B-1 List of Files After Installation ___________________________________________________________ Pathname (Directory) Filename ___________________________________________________________ /usr/opt/IPFM/ ipfm_alarm.h ipfm_disk_cmp ipfm_disk_config ipfm_dsp ipfm_get_disk_lsm ipfm_monitor_mc ipfm_psw_act.conf ipfm_psw_act.sav ipfm_psw_status ipfm_restart_cons ipfm_user_api.o ipfmalarm_cleanlog ipfmalarm_menu IPFMalarm_psw.config IPFMalarm_psw.config_old IPFMalarm_startup ipfmalarm_timer ipfmalarm_watchdog_setup ipfmdisk_timer config.file (continued on next page) B-1 Table B-1 (Cont.) List of Files After Installation ___________________________________________________________ Pathname (Directory) Filename ___________________________________________________________ files pdtp.o README serve_mang_reg stanza.static psw_ipfmalarm psw_feed_ipfmalarm_logfile psw_ipfmalarm_restart_process B-2 _________________________________________________________________ Glossary acknowledged alarm conditions Alarm conditions whose alarms have been deactivated by the CLEAR ALARM command, remote input, or line message. acknowledgment messages Messages the alarm board sends to the application in response to alarm messages, timer reset messages, system status messages, initialization messages, messages changing the function of the outputs, and messages activating Reset and Alarm Cutoff. action class An action class specifies a group of events and the actor to which any event messages for those events are sent. An action class is defined in the Actor Manager's configuration file. active status The IP Fault Manager alarm panel is active and functioning. actor Actors display or act on the event messages they receive from the actor manager. An actor is either a presentation module (displays the information), a feeder module (feeds the event messages to an application) or a function module (takes the action). actor manager An actor manager sends event messages to actors. Glossary-1 agent A background task running on each monitored node, scanning devices and data structures and generating event messages in an internal list. The agent responds to requests for information by the network management station (NMS). The agent is responsible for performing get and set operations, generating traps, and controlling access. alarm conditions Alarm conditions occur when a component or process malfunctions. alarm messages Alarm messages send by the application to the alarm board to set and clear alarms, and to identify the cause of alarms. alarm panel LEDs visible through the front panel of the DTP chassis that indicate system status, alarm status and power supply. alarm status One of three alarm conditions, that is major, minor or critical alarms. AlphaServer Digital's new generation of server systems based on the Alpha 64-bit computing architecture. boot device The device from which the system bootstrap software is acquired. boot Short for bootstrap. To load an operating system into memory. Glossary-2 bus A collection of many transmission lines or wires. The bus interconnects computer system components, providing a communications path for addresses, data, and control information or external terminals and systems in a communications network. CD-ROM Compact disc read-only memory. The optical removable media used in a compact disc reader. central processing unit (CPU) The unit of the computer that is responsible for interpreting and executing instructions. command line interface The command line interface supports the Digital UNIX operating system. It allows you to configure and test the system, examine and alter the system state, and boot the operating system. console mode The state in which the system and the console terminal operate under the control of the console program. console subsystem The subsystem that provides the user interface for a computer system when the operating system is not running. console terminal The terminal connected to the console subsystem. It is used to start the system and to direct activities between the user and the computer system. consolidator The consolidator is central software that attempts to connect through the network to remote agents and requests the list of current event messages from each agent. critical alarm condition A severe condition that affects the performance of the IP platform; it requires immediate attention. Glossary-3 DTP Dialogic Telcom Platform. DSX-1 Digital Signal Cross-connect Level 1. Any equipment that supports a set of parameters for cross-connecting DS-1 (either T-1 or E-1) lines. E-1 Another name given to the CEPT (Conference of European Postal and Telecommunications Administrations) digital telephony format. E-1 is a digital transmission channel that carries data at the rate of 2.048 Mb/s (DS-1 level). ECC Error correction code. The code and algorithms used by logic to facilitate error detection and correction. EISA bus Extended Industry Standard Architecture bus. A 32-bit industry-standard I/O bus used primarily in high-end PCs and servers. EISA Configuration Utility (ECU) A feature of the EISA bus that helps you select a conflict- free system configuration and perform other system services. The ECU must be run whenever you change, add, or remove an EISA or ISA controller. environment variables The global data structures that can be accessed from console mode. The setting of these data structures determines how a system powers up, boots the operating system, and operates. Ethernet The IEEE 802.3 standard local area network. event An event is a problem or situation detected by the System Watchdog. Glossary-4 event class An event class is a group of events. Event classes are used in the consolidator's configuration file to specify groups to be checked for on remote nodes. event sensors An event sensor is a module that checks a monitored node for one or more events. external event An external event is an event that is processed and reported and an event by System Watchdog, but that is not detected by an event sensor. hot swap The process of removing a device from the system without shutting down the operating system or powering down the hardware. hub A central device, usually in a star topology local area network (LAN), to which each network module is attached. initialization The sequence of steps that prepare the computer system to start. Initialization occurs after a system has been powered up. Interrupt request lines (IRQs) The bus signals that connect an EISA or ISA module (for example, a disk controller) to the system so that the module can get the system's attention via an interrupt. ISA Industry Standard Architecture. An 8-bit or 16-bit industry-standard I/O bus, widely used in personal computer products. The EISA bus is a superset of the ISA bus. LAN Local area network. A high-speed network that supports computers connected over limited distances. Glossary-5 light-emitting diode (LED) An indicator of status on an IP (Intelligent Peripheral) subsystem. loop start A method of starting (seizing) a telephone line or trunk by sending a supervisory signal (going off-hook) to the central office. This method bridges the tip and ring (the two conductors of a telephone cable pair) through a resistance. major alarm condition A serious disruption of service or malfunction of important circuits; it requires immediate attention. mass storage device An input/output device on which data is stored. Typical mass storage devices include disks, magnetic tapes, and CD-ROM. minor alarm condition An alarm condition that causes minimal disturbance in service. module A hardware or software component that is a self-contained system interacting with a larger system. Hardware modules are often made to plug into a main system. network A collection of computers, terminals, and other devices together with the hardware and software that enables them to exchange data and share resources over either short or long distances. network management station (NMS) A PC or workstation equipped with an Ethernet, FDDI, or Token Ring network module and HUBwatch software that enables it to communicate with and manage network modules. Glossary-6 network modules Modular devices that provide network connectivity or services that can be installed in a DEChub backplane or used as standalone devices. Network modules include repeaters, concentrators, bridges, brouters, access servers, switches, and SNMP agents. out of service Alarm condition that inhibits normal operation of the alarm panel, and the system is not yet repaired. PCI Peripheral component interconnect. An industry-standard expansion I/O bus that is the preferred bus for high- performance I/O options. PCI is available in a 32-bit version and a 64-bit version. PCI-to-EISA bridge The capability to transfer commonly available EISA and ISA options to the PCI bus. Polling interval The amount of time between requests for event messages from the consolidate to the agents on the monitored nodes. Presentation module A presentation module is an actor that receives event messages and either displays the information or feeds the information to another application for dedicated display. protocol A formal set of rules governing the format, timing, sequencing, and error control of exchanged messages on a data network. RAID Redundant arrays of independent disks. A technique that organizes disk data to improve performance and reliability. RAID has three attributes: it is a set of physical disks viewed by the user as a single logical device; the user's data is distributed across the physical set of drives in a defined manner; and redundant disk capacity is added Glossary-7 so that the user's data can be recovered even if a drive fails. redundant Pertaining to duplicate or extra computing components that protect a computing system from failure. reliability The probability that a device or system will not fail to perform its intended functions during a specified time. repeater A level 1 hardware device that restores signal amplitude, waveform, and timing of signals before transmission to another network segment. SBB StorageWorks building block. The basic building block of the StorageWorks product line. Any device conforming to shelf mechanical and electrical standards installed in either a 3½-inch or 5¼-inch carrier is considered to be an SBB, whether it be a storage device, a power supply, or other device. SCSI Small Computer Systems Interface. An ANSI-standard interface for connecting disks and other peripheral devices to computer systems. Some devices are supported under the SCSI-1 specification; others are supported under the SCSI-2 specification. server A network node or specialized device that provides and manages access to shared network resources, such as hard disks, printers, and software. SRM The user interface to console firmware for operating systems that expect firmware compliance with the Alpha System Reference Manual (SRM). Glossary-8 standby The status of the backup system in a redundant configuration, where the primary system is functioning normally. StorageWorks Digital's modular storage subsystem (MSS), which is the core technology of the Alpha SCSI-2 mass storage solution. StorageWorks consists of a family of low-cost mass storage products that can be configured to meet current and future storage needs. system disk The device on which the operating system resides. System Watchdog System Watchdog is the collective term for the versions of POLYCENTER System Watchdog software which run on different platforms. Functionaly, this term applies to each version of POLYCENTER System Watchdog, irrespective of the platform on which it is runs. T-1 The digital telephony format used in North America. T-1 is a digital transmission link handling 24 voice conversations on two pairs of twisted wires. Telnet The TCP/IP standard protocol for remote terminal connections. Using Telnet, a user at one site can connect to a timesharing system at another site as if the user's terminal is connected directly to the remote machine. terminal server A module that allows a terminal to connect to a network node. ThinWire Ethernet cabling and technology used for local distribution of data communications. ThinWire cabling is thinner than thick wire cabling. Glossary-9 Transmission Control Protocol (TCP) The transport protocol offering a connection-oriented transport service in the Internet suite of protocols. Unavailable The status of a troubled system or a system under repair. The maintenance center can initiate placing a system in and taking a system out of unavailable mode. Glossary-10