SUMMARY : Boot failure

From: Hisham Al Saad <ahisham_at_batelco.com.bh>
Date: Mon, 25 Dec 2000 10:35:47 +0300

Sorry for my late summary and thanks to all who responded , original message
below :
Valuable replies came from :
Clegg Larry
Uwe Richter
Christian Wessely
Philip Ordinario
Leonardo Velloso Heitor
John Tan
Rasal Kumarage
Peter Reynolds

This server is not a production server that's why I took my time. After
reading all the messages , I tried to boot from the other RAID member but
the problem still exist, I haven't tried other suggested solutions. I went
to the a personal conclusion that the RAID controller is faulty and need an
on site engineer.
I preferred to list some replies because they have some steps to follow.

Uwe Richter :-

I suspect, your boot disk is mirrored by the Mylex dac960 EISA controller.
The logical RAID can be configured with a ECU program you must run from the
ARC console:
(- DO NOT remove the disk drives connected to the mylex; I did so
   yesterday and destroid all RAID sets)
- unpack arccf.exe out of m-arcapp.zip from
  Mylex Configuration and flash Utility (NEW) on
  http://www5.compaq.com/support/files/alphant/drivers/index.html
- start the ARC console from the SRM console
>>>arc
  wait
- verify, that your local floppy drive is set to 1.44MB
  via: Suppl. menu ... / set system settings save these settings
- run program ... arccf.exe
  With this program you can configure the Mylex dac960 RAID controller.
  You will need a keyboard with a real escape key. F11 or CTRL3
  on native DEC keybords didn't work for me.
--------------------------------------------------------------
Christian Wessely :-

If RAID 1, you have the mirrored copy on the other
drive, so you just have to run the raid config utility from the arc console
and to mark the drive as failed - and to reboot. should work then. (in my
case,
the name of the util is swxcrmgr - may differ in your system)
------------------------------------------------------------
Philip Ordinario seggusted a solution if I was using storageworks :

Are you using storageworks to manage your raids? If you are, you need the
storageworks utility diskette and run arc from the prompt >>>. Run the
utility from diskette. From here you can reset the failed to ready. This
may solve your problem and may get you going until you get a new disk.
----------------------------------------------------------
Rasal Kumarage :

Check to see any fault lights (normally amber) on any of the RAID disks. If
so need to be replaced. Probably if only a single disk has failed you should
still be able to boot from the other disk.
Try following:

Boot from OS CD
Try to mount / & /usr . If successful try to check the RAID status using OS
commands. Other wise you will have to use either front panel keys (if any )
to find out RAID status or an off line RAID utility (ie booting from a
diskette ..) tio check & repiar the RAID.

Your problem might also be due to a RAID controller problem .
-------------------------------------------------------------
Peter Reynolds

As long as the system is at the console prompt, you may pull the disks out
of their slots enough to
remove power from them, and then replace them. Another suggestion is to
power off the system, then remove power to the storageworks shelf that
contains the disks. Wait about 30-60 seconds, then reapply power to the
storageworks shelf, and power up the system. That is what I would do first,
really. It is very important that the disks are powered up before the system
is. If you do a 'show dev' at the console prompt, and the raidset dra0 still
shows failed after trying the above, you will probably need a hardware
engineer. If you are familiar with 'swxcrmgr' or 'srlmgr' it would also be
worth running them to see if there are any problems with the configuration.
It is not unknown for these controllers to loose their configuration, due to
a power glitch, or other outside influence.
------------------------------------------------------------
Original message :-
Today I had an Alpha 4100 (Tru64 4.0E) server startup problems. Every time
it tries to boot it gives :-

(boot dra0.0.0.3.1 -flags A)
READ/WRITE failed with status 0002 on dra.0.0.0.3.1
failed to read dra.0.0.0.3.1
bootstrap failure


System H/W details :-.

>>Show dev

polling ncr0 (NCR 53C810) slot 1, bus 0 PCI, hose 1 SCSI Bus ID 7
dka500.5.0.1.1 Dka500 RRD47 1206
polling isp0 (Qlogic ISP1020) slot 2, bus 0 PCI, hose 1 SCSI Bus ID 7
polling dac0 (Mylex DAC960) slot 3, bus 0 PCI, hose 1
dra0.0.0.3.1 DRA0 2Member RAID 1 Failed
dra1.0.0.3.1 DRA1 4Member RAID 5
polling floppy0 (FLOPPY) PCEB -XBUS hose 0
dva0.0.0.1000.0 DVA0 Rx23
polling tulip0 (DECchip 21140-AA) slot 4, bus 0 PCI, hose 1
ewa0.0.0.4.1 00-00-F8-09-94-EA Fast
polling tulip1 (DECchip 21140-AA) slot 5, bus 0 PCI, hose 1
ewb0.0.0.5.1 00-00-F8-05-6A-4A Fast
polling pfi0 (DEC PCI FDDI) slot 5, bus 0 PCI, hose 0 fwa0.0.0.5.0
pdq_state_k_link_unavail
DEFPA Error: fwa0.0.0.5.0 can not be started
DEFPA Error: please check FDDI connection
fwa0.0.0.5.0 00-00-F8-E7-82-9D

It seems that there is a problem on RAID 1.
Is there anything I can do to fix this or I just need to call a Hardware
engineer.

Hisham Al Saad
Bahrain Telecommunications Company
Tel : +973-883973
Fax : +973-9103973
ahisham_at_batelco.com.bh
Received on Mon Dec 25 2000 - 07:31:30 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT