[S] panic (cpu 1): kernel memory fault

From: Clare West <clare_at_cs.auckland.ac.nz>
Date: Tue, 15 Oct 1996 10:45:18 +1300

My problem was:

> I have had four crashes in the last few weeks, the last two yesterday. I
> have crash dumps for the last three. The relevant lines from the dumps seem
> to be the following. The pc and ra contents are invariant, so this is bound
> to be the same bug. The machine is a AlphaStation 2100 4/275 running DU 4.0
> with two processors. The last two crashes were on cpu 1, the one before
> that on cpu 0.
>
> trap: invalid memory read access from kernel mode
>
> faulting virtual address: 0x0000000100001ff0
> pc of faulting instruction: 0xfffffc000050460c
> ra contents at time of fault: 0xfffffc0000504608
> sp contents at time of fault: 0xffffffffa16db710

[trimmed]

Thanks to

Dave Cherkus <cherkus_at_UniMaster.COM>
"Stephen L. LaBelle" <labelles_at_mscd.edu>
sxkac_at_java.sois.alaska.edu (Kurt Carlson)

After some investigation:

# dbx -k /vmunix
(dbx) 0xfffffc000050460c/i
 [pmap_pt_access:1924, 0xfffffc000050460c] ldq r18, 8176(r9)
(dbx) 0xfffffc0000504608/i
 [pmap_pt_access:1924, 0xfffffc0000504608] ldq r9, 8(r9)

the problem was identified as the one fixed by this patch in the Spetember
26 patch kit.

PROBLEM: ( QAR 45083 ) (Patch ID: OSF400-032)

        The system panics with the message "panic (cpu #): kernel memory
        fault" . In the crash dump, the stack trace shows
        pmap_clrmod_fow calls pmap_pt_access. The pmap_clrmod_fow
        does not check for a segment pte and calls pmap_pt_access,
        which result in the fault. This patch corrects that problem.

        Stack Trace:

> 0 stop_secondary_cpu(do_lwc = 0) ["../../../../src/kernel/arch/alpha/cpu.c":
   1 panic(s = 0xfffffc00006b4578 = "event_timeout: panic request") ["../../../
   2 event_timeout(func = 0xfffffc000027df90, arg = 0xfffffc00008892e8, timeout
   3 xcpu_puts(s = 0xfffffc000070ea00, prfbufp = 0xfffffc00008892e8) ["../../..
   4 printf(va_alist = -4398039748776) ["../../../../src/kernel/bsd/subr_prf.c"
   5 panic(s = 0xfffffc00006b70e0 = "kernel memory fault") ["../../../../src/ke
   6 trap() ["../../../../src/kernel/arch/alpha/trap.c":1457, 0xfffffc00004ea37
   7 _XentMM(0x4, 0xfffffc00004fdc6c, 0xfffffc00006fea00, 0xfffffc0001520040, 0
   8 pmap_pt_access(map = 0x400010002, v = 14123853069314699368, pte1p = 0xffff
   9 pmap_clrmod_fow(phys = 18446739675775960960) ["../../../../src/kernel/arch
  10 ubc_msync(0x0, 0xfffffc0009b87e00, 0x700000040, 0xfffffc0009b87e08, 0xffff
  11 u_vp_msync(0x700000040, 0xfffffc0009b87e08, 0xfffffc0009b87e48, 0xfffffc00
  12 u_map_control(0xfffffc00006f7ce0, 0xfffffc000677b500, 0xfffffc0000fbce40,
  13 msync(0xfffffc0000fbce40, 0xfffffc00065f78c0, 0xfffffc0000000001, 0x0, 0xf
  14 syscall(0x1, 0x0, 0x4c1ef31517bdc, 0x12007dca4, 0xd9) ["../../../../src/ke
  15 _Xsyscall(0x8, 0x120080098, 0x14001ca20, 0x222000, 0x10000) ["../../../../

These symptoms exactly match my crash dumps. There have been no further
crashes, but it has only been 17 hours since I applied the patch.

When applying the patch make sure you use /bin/make and not gmake, as gmake
does not like the Makefile in /usr/sys/BINARY.

I am still getting these errors:

chk_bf_quota: user/group underflow
chk_blk_quota: user/group underflow

with no apparent ill effects, but still rather perplexing. I have quotas on
2 advfs file systems and they seem to be working ok.

> This crash is particularly annoying because my machine seems to hang at the
> "syncing disks..." stage until the halt button is pressed. Perhaps this is
> related to my disk space shortage.

This is still of concern to me. I am not running any database applications
so fatal file system errors are unlikely, but still, I'd rather it didn't
happen. I am getting no CAM SCSI errors (in fact no errors at all are
recorded in uerf for the past month except the panics mentioned above).

clare
Received on Tue Oct 15 1996 - 00:05:37 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:47 NZDT