Shared Disk Array Troubles

From: Aaron G. Sword <tru64_at_webmediadesigns.com>
Date: Sat, 21 Jul 2001 00:18:09 -0400

Hello managers ...

I am having some problems with two Tru64 5.0a boxes, and a shared disk
array between them. I'll try to make it brief, but if someone has any
insight as to what could be causing these troubles, I would be grateful.

The shared disk array [six disks, striped and mirrored] is attached to the
two boxes, and we are running in an ad-hoc cluster environment using
heartbeat. One box is the master and has the disk array or shelf [using
AdvFS and LSM] imported, mounted, and running user email, home directories,
etc.

Our situation is that the shelf will become unavailable. It shows as being
mounted when you run df but if you try to cd into once of the directories
that reside on the shelf, you get a 'Permission Denied' error. I can
deport and then re-import the shelf, and it becomes available again, but
only for a few minutes.

I have previously deleted and then recreated the AdvFS file domains and
file sets, then restored from tape, and it ran for about 24 hours. since
the first crash after that it will only run for a few minutes. I am
running verify as I type.

I have checked the uerf logs but I don't see anything that jumps out at me,
there are some CAM SCSI errors [samples below] but I don't know if they are
indicative of the problem. I am thinking a bad disk maybe, but would
appreciate any advice.

Thanks for your time!

Aaron G. Sword
SunLit Surf

Example 1 - this machine usually has the shelf imported and in use

********************************* ENTRY 16.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 287.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Fri Jul 20 09:14:12 2001
OCCURRED ON SYSTEM wickb
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
PROCESSOR COUNT 3.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

CLASS x0037
SUBSYSTEM x0000 DISK
BUS # x0002

--------------------

Example 2 - this machine usually has the shelf imported and in use

********************************* ENTRY 21.
*********************************

----- EVENT INFORMATION -----

EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 225.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Thu Jul 19 23:34:30 2001
OCCURRED ON SYSTEM wickb
SYSTEM ID x00060009 CPU TYPE: DEC 2100
SYSTYPE x00000000
PROCESSOR COUNT 3.
PROCESSOR WHO LOGGED x00000000

----- UNIT INFORMATION -----

CLASS x0022 DEC SIM
SUBSYSTEM x0000 DISK
BUS # x0000
                               x0000 LUN x0
                                         TARGET x0
Received on Sat Jul 21 2001 - 04:19:47 NZST

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:42 NZDT