Hello,
my system crashed 3 times, about 3 months between them. The system made a
crash dump and crash-data files too. When I patched the system after 3rd
crash to the current patch version the system crashed after 1 week uptime.
(I can't do a strict statistic, because there were some shutdown due
any reason.) In all case the panic string was: "assertion failed".
I looked the crash-data files and I found that all crashing processors ran
the nfs_thread(). The trace log is different in every dump but usually
like this:
Begin Trace for machine_slot[paniccpu].cpu_panic_thread:
thread 0xfffffc01b90d0700 stopped at [stop_secondary_cpu:614
,0xfffffc00004945d4] Source not available
warning: Files compiled -g3: parameter values probably wrong
> 0 stop_secondary_cpu(do_lwc =
0x0) ["../../../../src/kernel/arch/alpha/cpu.c":614, 0xfffffc00004945d4]
1 panic(0x3c3b9d24, 0x1f, 0x60000, 0x0,
0x1) ["../../../../src/kernel/bsd/subr_prf.c":751, 0xfffffc0000286dd4]
2 event_timeout(func = 0xfffffc0000287060, arg =
0xfffffc000096ba20) ["../../../../src/kernel/arch/alpha/cpu.c":1183,
0xfffffc000049546c]
3 xcpu_puts(0xfffffc0000287060, 0xfffffc000096ba20, 0x4c4b40, 0x0,
0xfffffc000028626c) ["../../../../src/kernel/bsd/subr_prf.c":895,
0xfffffc00002870bc]
4 printf(0xfffffc000080db58, 0x1, 0xfffffc00008d5bd8, 0x0,
0xffffffffffffffcc) ["../../../../src/kernel/bsd/subr_prf.c":423,
0xfffffc0000286268]
5 panic(0x0, 0xfffffc00003cfa10, 0x250bda, 0x4d,
0x7500007568) ["../../../../src/kernel/bsd/subr_prf.c":804,
0xfffffc0000286f38]
6 at_query(mapping =
0xfffffffeeabc7768) ["/share/sandboxes4/decdce3.1/src/file/gateway/libgwauth/auth_at.c":364,
0xfffffc00006e494c]
7 at_translate_creds(addr = 0xfffffc01b88540a4, uid = 0xc8, cred =
0xfffffc01b908b400) ["/share/sandboxes4/decdce3.1/src/file/gateway/libgwauth/auth_at_ki.c":160,
0xfffffc00006e6cc4]
8 checkauth(xpd = (nil), req = (nil), cred =
0xfffffc01b908b400) ["../../../../src/kernel/nfs/nfs_server.c":7995,
0xfffffc00003db14c]
9 rfs_dispatch(req = 0xfffffffeeabc7920, xprt =
0xfffffc01b8854080) ["../../../../src/kernel/nfs/nfs_server.c":7650,
0xfffffc00003daa78]
10 nfs_rpc_recv(0x4a108d3100000007, 0x1, 0xfffffc01b90cca80,
0xfffffc010000002c,
0xfffffc01b90ccda0) ["../../../../src/kernel/rpc/svc.c":747,
0xfffffc00003ce38c]
11 nfs_rpc_input(0xfffffc01b90cca80, 0xfffffc010000002c, 0x0,
0xfffffc01b90ccc10,
0xfffffc0000000000) ["../../../../src/kernel/rpc/svc.c":712,
0xfffffc00003ce2bc]
12 nfs_input(m =
0xfffffc017936e600) ["../../../../src/kernel/nfs/nfs_server.c":6836,
0xfffffc00003da4f4]
13 nfs_thread() ["../../../../src/kernel/nfs/nfs_server.c":6110,
0xfffffc00003d9288]
End Trace for machine_slot[paniccpu].cpu_panic_thread:
This machine a member of a computing cluster which are connected to
eachother with memory channel. The crashing machine contains the home fs
and this is the only nfs service with high load. All user work with huge
dataset which are stored in home (through nfs) and on the scratch disks
which are local fs on all system.
When the system crashed all times there were more CPU running the
nfs_thread function so it looks like it fails at high load.
This is a Tru64 4.0F OS on AS4100 system.
Is there any idea how can I solve this problem?
Thanks,
Ned
Received on Thu Jan 10 2002 - 08:57:35 NZDT