DIGITAL Software Product Description ___________________________________________________________________ PRODUCT NAME: Intelligent Peripheral Fault Manager SPD 60.35.03 for Tru64 UNIX, Version 2.2 DESCRIPTION Intelligent Peripheral Fault Manager for Tru64 UNIX (IPFM) Version 2.2, a layered software product, provides fault management services for the AlphaServer Intelligent Peripheral Platform. It monitors events within the platform and provides visual and audible alarms. In addition, it provides a software API (the IP Fault Manager Application Programming Interface) to allow applications or other layered software to integrate into the same alarm subsystem. IPFM V2.2 is supported on Tru64 UNIX V4.0D and V4.0E, and is Y2000 compliant. IPFM V2.2 provides the following new functionality: o Process monitoring o Maximum number of alarms increased to 2048 o New API routine for getting the number of outstanding alarms and the alarm text o New API routine for clearing all alarms o Support for warnings and informational messages o The ability to disable audible alarms o The ability to limit, by severity, the SNMP traps generated See the Release Notes for more information. July 1999 AE-QN0GD-TE Intelligent Peripheral Fault Manager SPD 60.35.03 for Tru64 UNIX, Version 2.2 FUNCTIONAL DESCRIPTION The IP Fault Manager components are: o Event Detector - Event detection software that runs on each sys- tem in the Intelligent Peripheral. Monitors the Tru64 UNIX oper- ating system for detected errors, and monitors the system for user- specified events as defined in the configuration files. o Event Manager - The manager of the event database. The Event Man- ager coordinates the combination of all outstanding events on the local system. The Event Manager logs all event activity in the lo- cal event log file as well as insures that the local event database is kept up to date. It also manages the visual and audible alarms on the Indicator Module to ensure it is consistent with the Event Database. o Event Database - The repository of outstanding events. This event database contains an entry for each outstanding event that has been detected or presented to a system. The events can be added (SET), acknowledged (ACK), and removed (CLEAR) from the event database. o IPFM Application Programmer's Interface (API) - A programming in- terface that third-party applications can use to set, acknowledge and clear alarmed events within the AlphaServer IP Platform. o IPFM Operator Interface - An interface that can be used to display outstanding events , as well as to set, acknowledge, and clear alarms manually on the local system. o IP Event Log File - An event log file on each system in the Intel- ligent Peripheral Platform. The log files contain every event that is detected or generated. o Configuration Files - Files that are used to allow user customiza- tion of certain details of the IPFM configuration. The user can modify all periodic timers (used for repeated actions), as well as the event-to-alarm severity definitions. 2 Intelligent Peripheral Fault Manager SPD 60.35.03 for Tru64 UNIX, Version 2.2 o SNMP subagent / IPFM MIB - Maintains the MIB (SNMP database) to be consistent with the Event Database and sends traps to a network management station (such as ServerWORKS or TeMIP) to trigger visual indicators and provide data for the alarm database within the net- work management station. Any network management station with an SNMP interface can manage IPFM. EVENT DETECTION The Event Detector is responsible for detecting any events that oc- cur on a system, and reporting the events to the IPFM Event Managers. The IPFM Event Detector detects several categories of events: processor-based events (hardware events detected within the Tru64 UNIX operating system), external events, and storage events. When events are detected, they are checked against the list of reportable events (listed in the configuration file). An alarmed event is categorized by severity: Critical: Severe, service-affecting condition requiring immediate corrective action. Major: Serious disruption of service or the malfunctioning or fail- ure of important functions or components. Less immediate or impend- ing effect on system performance than Critical. Minor: Trouble that does not have a serious effect on service, or that occurs in functions or components that are not essential for providing service. Warning: Warning messages. Informational: Informational messages. If the event passes the event filter, it is passed to the Event Man- ager. Since the Event Detector and the Event Manager reside in dif- ferent processes, the information is passed using common interprocess communication techniques. 3 Intelligent Peripheral Fault Manager SPD 60.35.03 for Tru64 UNIX, Version 2.2 INTERFACES The IP Fault Manager has interfaces to other components of the AlphaServer IP Platform: o The IP Alarm Control Module An IP alarm control module must be present in order to support fault management for the chassis and for user defined events. The alarm control module has interfaces to the AlphaServer IP Platform sub-assemblies. When the alarm control module detects an event, it generates an interrupt to system software in order to update the event database. o Alarm Control Module Device Driver The Alarm Board Device Driver has an interface with the alarm con- trol module in the AlphaServer IP Platform. The alarm control mod- ule is the board that the device driver directly accesses. The de- vice driver presents an internal interface to the IPFM code, which allows the IPFM Fault Coordinator to set and clear minor, major, and critical alarms on the alarm indicator panel and to receive in- formation through the alarm control module. The information from the alarm control module can be mapped by the end users to indi- cate alarms and alarm priorities as they see fit. This device driver is dynamically loadable. The setting and clearing of alarms is accomplished through ioctl calls to the device driver. This driver also supports ioctl calls, which reset the expansion board and the alarm board. When an event condition occurs or ceases to occur, the alarm con- trol module interrupts the device driver, which reads the Status Register (SREG) to determine what change has occurred in the event conditions monitored. This event condition is then passed up to the event detector, and the appropriate action is taken as with any other event condition. The device driver has a diagnostics section that runs the hardware supplied self-test through the alarm control module and performs any additional hardware diagnostics that require software assist. The results of the diagnostics are reported to the operator. 4 Intelligent Peripheral Fault Manager SPD 60.35.03 for Tru64 UNIX, Version 2.2 o IP Alarm Indicator Panel The IP alarm indicator panel displays alarm and system status for the AlphaServer Intelligent Peripheral Platform. It is connected to the AlphaServer 1000A system through a cable to the IP alarm control module. Setting, clearing, and acknowledging alarm status (Critical, Major, Minor, Warning, Informational) can be accom- plished through the IPFM software menu. Clearing and acknowledging alarm status (Critical, Major, Minor, Warning, Informational) can be accomplished from an SNMP-compliant network management station. o The Console The console has a multifunction purpose. It provides the AlphaServer IP Platform operator with a terminal window to: Use as a system console for the AlphaServer 1000A system. Control manually the state of the IP alarm indicator panel by set- ting or clearing alarm indicators or modifying system status by means of the IPFM menu interface, if an NMS such as ServerWORKS or TeMIP is not being used for this purpose. o IP Alarm Log File This log file records the occurrences of all IP events and is ac- cessible by the AlphaServer IP Platform operator. It serves as a permanent record of all events that are displayed on the AlphaServer IP Platform console, operator workstation, or SNMP network management station. OPERATIONS SUPPORT The AlphaServer IP Platform can be managed locally or remotely. Fault Management is provided via the IPFM operator interface, ServerWORKS, TeMIP, or other SNMP-compliant NMS. Operator interfaces to the fault management capabilities of the AlphaServer IP Platform include a ba- sic screen interface to each processor, specific to the IP fault man- agement capabilities and accessible from a character terminal. In ad- dition, a consolidated view of the fault management state of all pro- cessors in a distributed system can be provided via the ServerWORKS 5 Intelligent Peripheral Fault Manager SPD 60.35.03 for Tru64 UNIX, Version 2.2 or TeMIP GUI operator interface. Finally, IPFM supports an SNMP in- terface allowing management using an SNMP-compliant network manage- ment station (NMS). USER INTERFACES The IP Fault Manager offers user interfaces as follows: o A user application (API) that enables applications to monitor and control application-oriented fault events to be integrated with the ones already handled by the IPFM. The IPFM can then send messages to the IP alarm indicator panel to set and clear alarms on behalf of the application. o A user menu located on a window on the AlphaServer IP Platform that allows the system operator to manu- ally set and clear alarms as well as system status LEDs on the IP alarm indicator panel. o An IP alarm log file that provides a recorded fault event history of all events monitored by the IP alarm indicator panel, the IP Fault Manager itself, and the user application. o An SNMP interface that allows an SNMP-compliant network management station to monitor, clear, and acknowledge alarms from a remote lo- cation. HARDWARE REQUIREMENTS Processors Supported AlphaServer Model 4/200 1000: Model 4/233 Model 4/266 Model 5/300 AlphaServer Model 5/500 1000A: 6 Intelligent Peripheral Fault Manager SPD 60.35.03 for Tru64 UNIX, Version 2.2 Model 5/400 AlphaServer Model 5/400 4100: Model 5/466 Other Hardware o ISA bus expansion chassis with IP sensor module: 2T-VC220-IB; 2T- VC221-IB; 2T-IPSEN-AA o IP alarm control module: 2T-IPCON-AA o IP alarm indicator panel: 2T-IPAIP-AA, 2T-IPAIP-AB, 2T-IPAIP-CA, 2T-IPAIP-CB, 2T-IPAIP-BB, 2T-IPAIP-GB Disk Space Requirements for AlphaServer and Tru64 UNIX Systems Disk space required for 1.5 MB installation: Block cluster size = 1 These counts refer to the disk space required on the system disk. The sizes are approximate; actual sizes may vary depending on the user's system environment, configuration, and software options. SOFTWARE REQUIREMENTS o Tru64 UNIX, V4.0D, V4.0E o DECevent, V2.6 through V2.9 SOFTWARE LICENSING This software is furnished only under a license. For more information about Compaq's licensing terms and policies, contact your local Com- paq office. 7 Intelligent Peripheral Fault Manager SPD 60.35.03 for Tru64 UNIX, Version 2.2 License Management Facility Support This layered product supports the License Management Facility. The li- cense units for this product are allocated on a Concurrent Use basis. For more information on the License Management Facility, refer to the Tru64 UNIX Operating System Software Product Description or the Li- cense Management Facility manual, which is part of the Tru64 UNIX op- erating system documentation set. GROWTH CONSIDERATIONS The minimum hardware/software requirements for any future version of this product may be different from the requirements for the current version. DISTRIBUTION MEDIA CD-ROM ORDERING INFORMATION Intelligent Peripheral Fault Manager for Tru64 UNIX Software Licenses: QL-4K4A9-AA Software Media: QA-4K4AA-H8 Software Documentation: QA-4K4AA-GZ The above information is valid at time of release. Please contact your local Compaq office for the most up-to-date information. SOFTWARE PRODUCT SERVICES A variety of service options are available from Compaq. For more in- formation, contact your local Compaq office. 8 Intelligent Peripheral Fault Manager SPD 60.35.03 for Tru64 UNIX, Version 2.2 SOFTWARE WARRANTY Warranty for this software product is provided by Compaq with the purchase of a license for the product as defined in the Software Warranty Addendum of this SPD. c 1999 Digital Equipment Corporation. All rights reserved. [R] Compaq, the Compaq logo, and the Digital logo registered U.S. Patent and Trademark Office. [TM] AlphaServer, DECevent, and TRU64 are trademarks of Compaq Computer Corporation. [R] Dialogic is a registered trademark of Dialogic Corporation. [R] UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company Limited. Tru64 UNIX is an X/Open UNIX 95 branded product. 9