-- Everything is normal up to this point, then.. -- CAM_LOGGER: cam_error packet CAM_LOGGER: bus 0 target 0 lun 0 ss_perform timeout timeout on disconnected request Active CCB at time or error --- and the system hangs --- What should have happened next was... --- Dec 8 14:19:04 ns1 vmunix: scsi0 at psiop0 slot 0 Dec 8 14:19:04 ns1 vmunix: rz0 at scsi0 target 0 lun 0 (LID=0) (DEC RZ26F (C) DEC 630J) Dec 8 14:19:04 ns1 vmunix: rz4 at scsi0 target 4 lun 0 (LID=1) (DEC RRD45 (C) DEC 0436) Dec 8 14:19:04 ns1 vmunix: isa0 at pci0 Dec 8 14:19:04 ns1 vmunix: gpc0 at isa0 Dec 8 14:19:04 ns1 vmunix: ace0 at isa0 etc. So it seems to being dying during the "scsi0 at psiop0 slot 0" phase which leads me to think it might be a flakey scsi controller. Or, maybe the disk since the CAM error mentions "bus 0 target 0 lun 0." ??? BUT!, the system will fully boot if I recycle power. Once it's up for a while, it then crashes at random times. If someone could help me determine which is more likely at fault (the disk or controller or ??) in this situation, I would appreciate it. At this point, the system is totally hosed. In desperation, I installed v4.0E (thinking a re-format of the disk could be a cheap way out) and, of course, it crashed in the middle of loading the subsets. Not a good day. ;-) Some various replies -------------------- The RZ26F may be the problem. I believe the driver message indicates that a command didn't finish within the expected time frame, so it performed a bus reset to get the bus and device back into an expected state. alan_at_nabeth.cxo.dec.com -------------------- I'd start with the disk drive -- on-board drive controllers fail far more often than the (single-chip) motherboard SCSI controllers do. There's simply far fewer parts to fail. Swap out the drive with another, or hook up a different scsi drive to the external SCSI connector. If the problem disappears, your drive is sick. Otherwise, you need a new motherboard.... John Francini <francini_at_nashua.progress.com> -------------------- You are on the right track in suspecting either the SCSI controller or the disk. The SIOP is the NCR 53C810 chip; on that system, I believe it's on the motherboard, not an add-in card. In any case, open up the system box and make sure ALL of the cables are well-seated; I'd pull them off and reseat them, as over time, you can get a bit of corrosion on the connectors (oxidation) that can cause electrical connectivity problems. If you've got a spare disk (as the disk was probably hosed anyway when you partially installed V4.0E), you might swap in a spare hard disk, or if there's room in the box for two disks, hook it up a unit 1 and try the install there. (1GB SCSI disks are getting to be a commodity item these days; even Western Digital is selling them.) If the SCSI controller is a plug-in card, remove and reseat the card, as well as the cable reseats I noted. And make SURE the SCSI bus is terminated at the CDROM end. No termination can lead to SCSI errors. "Dr. Tom Blinn, 603-884-0646" <tpb_at_doctor.zk3.dec.com> -------------------- Similar CAM error messages were reported in the release notes for one of the 4.0 versions, I can't lay my hands on it right now, but the solution was to run, at the console, isp1020_edit -sd to change the code used by the Qlogic 1020 card. The 4.0D release notes section 3.1.5 mention similar problems which they suggest using eeromcfg to fix. And there are various Qlogic problem/code update/fixes described in recent firmware release notes, so you could try upgrading firmware if isp1020_edit -sd doesn't work for you. Oisin McGuinness <oisin_at_sbcm.com> -------------------- I have had a couple of problems exactly like this. In both cases, it was a bad hard drive. Could be a cable, too. Definitely doesn't look like a controller. Ian Watkins --------------------Received on Thu Feb 11 1999 - 15:08:46 NZDT
This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT