Greetings. We have an es40 running T64 v5.1a. Periodically it panics
with a "kernel memory fault", and output like the following is logged in
the messages file:
Jul 18 00:16:51 vmunix: trap: invalid memory read access from kernel
mode
Jul 18 00:16:51 vmunix: faulting virtual address:
0x0ffffc01ee31aa60
Jul 18 00:16:51 vmunix: pc of faulting instruction:
0xfffffc000070c2cc
Jul 18 00:16:51 vmunix: ra contents at time of fault:
0xfffffc000070b7d4
Jul 18 00:16:51 vmunix: sp contents at time of fault:
0xfffffe068a47f1a0
Jul 18 00:16:51 vmunix: panic (cpu 0): kernel memory fault
The pc and ra are always the same. I used dbx to find the source of the
problem, and got this:
(dbx) 0xfffffc000070c2cc/i
[tu_receive_int:5049, 0xfffffc000070c2cc] ldl t0, 0(s1)
(dbx) 0xfffffc000070b7d4/i
[tuintr:4506, 0xfffffc000070b7d4] ldq_u zero, 0(sp)
which seems to implicate a NIC. A look at interfaces shows that one of
them has lots of input errors, while others have none:
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs
Coll
tu1 1500 <Link> 00:06:2b:00:2d:79 633559 2131 1794775 11
0
tu1 1500 <ip> <hostname> 633559 2131 1794775 11
0
and a look at "netstat -s -I tu1" shows:
2131 receive failures, reasons include:
1160 frame check sequence errors
971 frame error
What is the next step in debugging this? Should I be talking to the
people who manage the switch the server is attached to? Should we look
at cable lengths? Am I chasing a red herring here?
TIA for thoughts and guidance!
Judith Reed
Received on Wed Jul 19 2006 - 20:15:40 NZST