We seem to be having multiple problems with our V5.1 Cluster. This
cluster currently has 4 systems in it. It is running Patch Kit 2. The
Storage solution is HSG80's.
Problem No 1: When the system is at the hardware prompt and we issue a
boot command, it keeps getting these messages. After the kernel has
been loaded they stop but periodically, some of our disks go offline.
Just wondering if anyone has experienced anything like this, that
would save us time. This just recently started. Was fine until last
week.
-----BEGINNING OF MESSAGES------
CPU 15 speed is 731 MHz
create powerup
initializing GCT/FRU at 214000
initializing pka dqa eia eib eic eid eie eif eig eih eii pga pgb pgc
pga0.0.0.7.1 - probe timeout retry
.pgd pgb0.0.0.1.2 - probe timeout retry
.pga0.0.0.7.1 - probe timeout failure
.
pge pgc0.0.0.7.9 - probe timeout retry
.pgb0.0.0.1.2 - probe timeout failure
.pga0.0.0.7.1 - probe timeout retry
.pgf pgd0.0.0.1.10 - probe timeout retry
.pgg pga0.0.0.7.1 - probe timeout failure
.pge0.0.0.2.16 - probe timeout retry
.pgd0.0.0.1.10 - probe timeout failure
.pgc0.0.0.7.9 - probe timeout failure
.pgb0.0.0.1.2 - probe timeout retry
.pgh pgf0.0.0.4.17 - probe timeout retry
AlphaServer Console V5.9-111, built on Jan 9 2001 at 15:21:06
pga0.0.0.7.1 - probe timeout retry
.
CPU 0 booting
(boot dga900.1001.0.7.1 -flags A)
pgg0.0.0.2.24 - probe timeout retry
.pge0.0.0.2.16 - probe timeout retry
.pgd0.0.0.1.10 - probe timeout failure
.pgb0.0.0.1.2 - probe timeout retry
.pga0.0.0.7.1 - probe timeout failure
.pgh0.0.0.4.25 - probe timeout failure
.block 0 of dga900.1001.0.7.1 is a valid boot block
reading 18 blocks from dga900.1001.0.7.1
bootstrap code read in
base = 5f4000, image_start = 0, image_bytes = 2400
initializing HWRPB at 2000
initializing page table at fffe0pgf0.0.0.4.17 - probe timeout retry
------END OF MESSAGES-----
Problem No 2: Also we keep taking these malloc_internal panics. Now
these just started a few days ago. We had this cluster running for a
while without any problems. Now all of a sudden this starts. We
couldn't even complete a run of sys_check -escalate before the system
paniced again. We have opened a call with Support but just wanted to
check and see if anyone had seen anything like this.
-------BEGINNING OF MESSAGES-------
panic (cpu 1): malloc_internal
syncing disks...
DUMP: 123446380 blocks available for dumping.
DUMP: 1141122 wanted for a partial compressed dump.
DUMP: Allow ng 16773118 of the 16777214 available on 0x13002d3
device string for dump = SCSI3 1 7 0 1 0 0 0 _at_wwid1.
DUMP.prom: dev SCSI3 1 7 0 1 0 0 0 _at_wwid1, block 262144
------END OF MESSAGES----------
I realise that this is not too much information but I am merely
looking for some advice if anyone has already seen these problems and
they have managed to get past them.
Everything was working fine until the beginning of this week and now
suddenly everything is falling apart. We tracked back and we really
haven't made any changes to this environment. Any and all help will be
greatly appreciated!!
Tks & Rgds,
Alay Shah
Allegiance Healthcare Corporation
1400 Waukegan Rd.
McGaw Park, IL 60085
Ph - 847-578-2584
Fax - 847-578-5586
Email - unixadmin_at_allegiance.net
shahal_at_allegiance.net
Received on Thu Apr 19 2001 - 21:23:05 NZST