Hi,
I've had a disk fail on an HSZ50 dual-redundant pair. The disk was failed
by the O/S (increasing number of hard errors and LSM eventually detaching
the disk). I removed the disk from LSM and then proceeded to remove the
disk from the HSZ50 configuration for replacement (the disk is a single disk
- no RAID, stripe or mirror). LOCATE located the disk properly (ie the
lights flashed) but a DELETE unit-number hangs the CLI. A couple of hours
later I have not got the command prompt. Using hzterm on the other
controller, a SHOW DISKS hung the other controller's CLI as well :-(
Note: the controller themselves are fine: no error LEDs and the other disks
are flashing merrily. But I can't talk to them at all and I can't remove
the failed disk from the configuration. I've pulled the disk just in case
that would allow the controllers to wake up, but nothing's happened apart
from getting the usual messages from the controller on the VT100 telling me
that there was a hardware failure on the disk I pulled.
My next guess is to restart the controllers - one at a time, so that
failover will allow uninterrupted service to the O/S.
Is there any other way I can interrupt a command that's running on the CLI?
Does anybody know how a simple DELETE unit can hang? I've done it many
times before without problems. My concern is that if there's something
broken in this configuration, the failover might fail and things will get
worse...
Thanks,
Marco
Marco Luchini
Unix Support
Acco-Europe
Received on Wed Sep 08 1999 - 12:34:17 NZST