Another Kernel Related Question for DU 4.0b - Crashing with 2 pan ics, memory and syncing disks

From: Grau, Michael <M.Grau_at_vgo.wa.gov.au>
Date: Fri, 26 Feb 1999 10:58:47 +0800

Hello All

I've read some interesting posts re: crashes for DU but I am posing a couple
of questions to the group:

Yesterday (25th) our Alpha server running two processors and advfs across
hsz70 controlled array cabinets crashed with a "kernel memory fault". This
is the second time in as many months so it is not urgent but no one likes a
server of this power and responsibility to go down unexpectedly. I will
summarise on receiving suggestions. The server was running the oracle dbs
and performing its other roles well (with little load) until it crashed and
rebooted with the next entry in the /var/adm/messages file reading:

trap: invalid memory ifetch access from kernel mode ...

panic (cpu 0): kernel memory fault
syncing disks... 102 102 device string for dump = SCSI 1 2 0 1 100 0 0.

carsh dump file gives the following:

tset machine_slot[paniccpu].cpu_panic_thread:
Begin Trace for machine_slot[paniccpu].cpu_panic_thread:
> 0 boot(0x400000000, 0xfffffc000025ac64, 0x4, 0xfffffc00005e47f0, 0x6)
["../../../../src/kernel/arch/alpha/machdep.c":2634, 0xfffffc0000511eac]
   1 panic(s = 0xfffffc000064ba08 = "panic stuck syncing disks")
["../../../../src/kernel/bsd/subr_prf.c":707, 0xfffffc000028404c]
   2 hardclock(pc = 0xfffffc0000510154 =
"^Ø^)^Ô"^S0_at__at_(={\262$={\240!^QD_at_,=^Û\240%q_at_C^E^V\276H")
["../../../../src/kernel/bsd/kern_clock.c":886, 0xfffffc000025980c]
   3 _XentInt(0x2, 0xfffffc0000510154, 0xfffffc00006c4550,
0xfffffc003e9c9600, 0x4c3a84)
["../../../../src/kernel/arch/alpha/locore.s":1032, 0xfffffc000050d854]
   4 simple_lock_D(0x2, 0xfffffc0000510154, 0xfffffc00006c4550,
0xfffffc003e9c9600, 0x4c3a84)
["../../../../src/kernel/arch/alpha/lockprim.s":764, 0xfffffc0000510150]
   5 vrele(vp = 0xfffffc003e9c9600)
["../../../../src/kernel/vfs/vfs_subr.c":2305, 0xfffffc00004c3ac4]
   6 mntbusybuf(mountp = 0xfffffc0073c8c500)
["../../../../src/kernel/vfs/vfs_bio.c":1419, 0xfffffc00004bc244]
   7 boot(0x0, 0xfffffc003f7ac000, 0x2c0000002c, 0x36, 0x6600000001)
["../../../../src/kernel/arch/alpha/machdep.c":2590, 0xfffffc0000511dac]
   8 panic(s = 0xfffffc000069c480 = "kernel memory fault")
["../../../../src/kernel/bsd/subr_prf.c":791, 0xfffffc00002841ec]
   9 trap() ["../../../../src/kernel/arch/alpha/trap.c":1539,
0xfffffc0000519f30]
  10 _XentMM(0x0, 0x0, 0xfffffc00006c4550, 0x1400a686c, 0x2d)
["../../../../src/kernel/arch/alpha/locore.s":1424, 0xfffffc000050dc24]
End Trace for machine_slot[paniccpu].cpu_panic_thread:

Clearly there are two panics, one for "stuck syncing disks", the other for
"kernel Memory fault" but I guess it is the relationship between these which
is important.


dbx -k /vmunix gives:

stopped at [thread_block:2097 ,0xfffffc00002b3160] Source not
available

warning: Files compiled -g3: parameter values probably wrong
(dbx) 0xfffffc00002b3160/i
>*[thread_block:2097, 0xfffffc00002b3160] bis r31, r10, r16





Michael Grau
Network Engineer

Enterprise Managed Services
AlphaWest Pty Ltd
Ph: (08) 9237 3041
Mobile: 041 331 5820
email: michael.grau_at_alphawest.com.au
visit our website : http://www.alphawest.com.au


**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager (helpdesk_at_vgo.wa.gov.au).

This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.

http://www.vgo.wa.gov.au
**********************************************************************
Received on Fri Feb 26 1999 - 03:02:23 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:39 NZDT