Hi folks,
we have a DEC PW600au running DU4.0e which has been crashing several
times in the past few days.
The culprit seems to be a 'decsound' process:
[...]
14712 decsound
_kernel_process_status_end:
_current_pid: 14712
_current_tid: 0xfffffc0009b52700
_proc_thread_list_begin:
thread 0xfffffc0009b52700 stopped at [boot:1931 ,0xfffffc00003e05dc]
Source not available
_proc_thread_list_end:
Could anyone give a hint whether this is really a CPU problem
or perhaps the sound card or ...?
I would also appreciate any hints what I could do for further
analysis (what does 'machine check code 98' or 'too many processor
corrected errors on cpu0' indicate?).
I attach a few excerpts from /var/adm/messages and
/var/adm/crash/crash-data.
Thanks in advance & a Happy New Year,
Volker
/var/adm/messages (excerpt)
-----%<-----------------------
Jan 2 21:08:40 cmcszsws vmunix: Environmental Monitoring Subsystem
Configured.
Jan 2 21:08:44 cmcszsws vmunix: mmsessprobe: IRQ channel = 5
Jan 2 21:08:44 cmcszsws vmunix: mmsess0 at isa0
Jan 2 21:08:44 cmcszsws vmunix: mmsess sound driver V4.1 configured
Jan 2 23:12:37 cmcszsws vmunix: WARNING: too many Processor corrected
errors detected on cpu 0. Reporting suspended.
Jan 3 05:54:31 cmcszsws vmunix: Machine Check Processor Fatal Abort
Jan 3 05:54:31 cmcszsws vmunix: Machine Check Code = 98
Jan 3 05:54:31 cmcszsws vmunix: Processor detected hard error
Jan 3 05:54:31 cmcszsws vmunix: pal temp[0-1] =
ffffffffffffffff 000003ffc0005bb0
Jan 3 05:54:31 cmcszsws vmunix: pal temp[2-3] =
fffffc00003dcd40 0000000000005200
Jan 3 05:54:31 cmcszsws vmunix: pal temp[4-5] =
0000000000000001 000000000000ff00
Jan 3 05:54:31 cmcszsws vmunix: pal temp[6-7] =
000000000000fff2 fffffc00003dc660
Jan 3 05:54:31 cmcszsws vmunix: pal temp[8-9] =
1f1e161514020100 fffffc00003dca80
Jan 3 05:54:31 cmcszsws vmunix: pal temp[10-11] =
000003ff80008adc fffffc00003dc8e0
Jan 3 05:54:31 cmcszsws vmunix: pal temp[12-13] =
fffffc00003dccb0 fffffffffff8da00
Jan 3 05:54:31 cmcszsws vmunix: pal temp[14-15] =
0000000000f00270 0000000000f0380c
Jan 3 05:54:31 cmcszsws vmunix: pal temp[16-17] =
0000009806700001 0000000000000000
Jan 3 05:54:31 cmcszsws vmunix: pal temp[18-19] =
000000011ffff9a0 ffffffff91457a38
Jan 3 05:54:31 cmcszsws vmunix: pal temp[20-21] =
0000000007098000 fffffc00003dcce0
Jan 3 05:54:31 cmcszsws vmunix: pal temp[22-23] =
fffffc000056d180 0000000000c99a38
Jan 3 05:54:31 cmcszsws vmunix: shadow[0-1] =
0000000000000000 0000000000000000
Jan 3 05:54:31 cmcszsws vmunix: shadow[2-3] =
0000000000000000 0000000000000000
Jan 3 05:54:31 cmcszsws vmunix: shadow[4-5] =
0000000000000000 0000000000000000
Jan 3 05:54:31 cmcszsws vmunix: shadow[6-7] =
0000000000000000 0000000000000000
Jan 3 05:54:31 cmcszsws vmunix: Address of excepting
instruction = 000003ff80008adc
Jan 3 05:54:31 cmcszsws vmunix: Summary of arithmetic traps
= 0000000000000000
Jan 3 05:54:31 cmcszsws vmunix: Exception mask
= 0000000000000000
Jan 3 05:54:32 cmcszsws vmunix: Base address for PALcode
= 0000000000018000
Jan 3 05:54:32 cmcszsws vmunix: Interrupt Status Reg
= 0000000100000000
Jan 3 05:54:32 cmcszsws vmunix: CURRENT SETUP OF EV5 IBOX
= 0000004166020000
Jan 3 05:54:32 cmcszsws vmunix: I-CACHE Reg Tag parity error
= 0000000000000000
Jan 3 05:54:32 cmcszsws vmunix: D-CACHE error Reg
= 0000000000000000
Jan 3 05:54:32 cmcszsws vmunix: Effective VA =
000003ff808b40f4
Jan 3 05:54:32 cmcszsws vmunix: reason for D-stream =
0000000000014290
Jan 3 05:54:32 cmcszsws vmunix: EV5 Secondary Cache address
= ffffff000001d04f
Jan 3 05:54:32 cmcszsws vmunix: EV5 Secondary Cache TAG/Data
parity = 0000000000000000
Jan 3 05:54:32 cmcszsws vmunix: EV5 BC_TAG_ADDR =
ffffff80054d6fff
Jan 3 05:54:32 cmcszsws vmunix: EV5 EI_STAT_ADDR Phys addr of
Xfer = ffffff000877a00f
Jan 3 05:54:32 cmcszsws vmunix: Fill Syndrome =
0000000000000017
Jan 3 05:54:32 cmcszsws vmunix: EI_STAT reg =
fffffff945ffffff
Jan 3 05:54:32 cmcszsws vmunix: LD_LOCK =
ffffff000e3e3a4f
Jan 3 05:54:32 cmcszsws vmunix: PYXIS_DMA_DATA =
0000000000000000
Jan 3 05:54:32 cmcszsws vmunix: CIA/PYXIS ERR
= 0000000000000000
Jan 3 05:54:32 cmcszsws vmunix: CIA/PYXIS ERR STAT
= 0000000000000000
Jan 3 05:54:32 cmcszsws vmunix: CIA/PYXIS ERR MASK
= 0000000000000b93
Jan 3 05:54:32 cmcszsws vmunix: CIA/PYXIS ECC_SYN
= 0000000000000000
Jan 3 05:54:32 cmcszsws vmunix: CIA/PYXIS MEM ERR0
= 000000000001d540
Jan 3 05:54:32 cmcszsws vmunix: CIA/PYXIS MEM ERR1
= 0000000058000000
Jan 3 05:54:32 cmcszsws vmunix: CIA/PYXIS PCI ERR0
= 0000000002010002
Jan 3 05:54:32 cmcszsws vmunix: CIA/PYXIS PCI ERR1
= 0000000000000071
Jan 3 05:54:32 cmcszsws vmunix: ISA bridge NMI status & control
= 0000000000000000
Jan 3 05:54:32 cmcszsws vmunix: CIA/PYXIS PCI ERR2
= 0000000000000071
Jan 3 05:54:33 cmcszsws vmunix: panic (cpu 0): Processor Machine Check
Jan 3 05:54:33 cmcszsws vmunix: syncing disks... device string for dump
= SCSI 0 1004 0 0 0 0 0.
Jan 3 05:54:33 cmcszsws vmunix: DUMP.prom: dev SCSI 0 1004 0 0 0 0 0,
block 131072
Jan 3 05:54:33 cmcszsws vmunix: device string for dump = SCSI 0 1004 0
0 0 0 0.
Jan 3 05:54:33 cmcszsws vmunix: DUMP.prom: dev SCSI 0 1004 0 0 0 0 0,
block 131072
Jan 3 05:54:33 cmcszsws vmunix: Alpha boot: available memory from
0xae0000 to 0xfffe000
Jan 3 05:54:33 cmcszsws vmunix: Digital UNIX V4.0E (Rev. 1091); Thu
Apr 22 17:44:56 GMT 1999
Jan 3 05:54:33 cmcszsws vmunix: physical memory = 256.00 megabytes.
Jan 3 05:54:33 cmcszsws vmunix: available memory = 245.41 megabytes.
Jan 3 05:54:33 cmcszsws vmunix: using 975 buffers containing 7.61
megabytes of memory
Jan 3 05:54:33 cmcszsws vmunix: Digital Personal WorkStation 600au
Jan 3 05:54:33 cmcszsws vmunix: Firmware revision: 6.9-7
Jan 3 05:54:33 cmcszsws vmunix: PALcode: Digital UNIX version 1.22-0
Jan 3 05:54:33 cmcszsws vmunix: pci0 at nexus
Jan 3 05:54:33 cmcszsws vmunix: tu0: DECchip 21143: Revision: 3.0
Jan 3 05:54:33 cmcszsws vmunix: tu0: auto negotiation capable device
Jan 3 05:54:33 cmcszsws vmunix: tu0 at pci0 slot 3
Jan 3 05:54:33 cmcszsws vmunix: tu0: DEC TULIP (10/100) Ethernet
Interface, hardware address: 00-00-F8-76-5E-93
/var/adm/crash/crash-data.0 (excerpt)
-----%<------------------------------
[...]
_dump_begin:
> 0 boot() ["../../../../src/kernel/arch/alpha/machdep.c":1931, 0xfffffc00003e05dc]
nmp = 0xfffffc000056cf50
rs = -4398040821936
mycpu = 1
rpb = 0xfffffc0000565a58
rpb_cpu = (nil)
item_list = struct {
function = 18446739675665624712
out_flags = 256347296
in_flags = 4294966272
rtn_status = 18446739675665891140
next_function = 0x3ff9e037cc0
input_data = 0
output_data = 18446739675919288960
}
1 panic(0x578, 0x16, 0xfffffc000ff00800, 0xfffffc000ff00800,
0x1ea6b59) ["../../../../src/kernel/bsd/subr_prf.c":755,
0xfffffc00002844b0
]
2 thread_block() ["../../../../src/kernel/kern/sched_prim.c":2159,
0xfffffc00002b8654]
thread = 0xfffffc0009b52700
new_thread = 0xfffffc00001d6100
mycpu = 0
myprocessor = 0xfffffc00001d6100
s = 5
pset = 0xfffffc000056cf50
3 thread_preempt(thread = 0x26, processor = 0xfffffc00001d6100)
["../../../../src/kernel/kern/sched_prim.c":4048, 0xfffffc00002bb034]
s = 2
pset = 0xfffffc0000593560
4 boot() ["../../../../src/kernel/arch/alpha/machdep.c":1876,
0xfffffc00003e04bc]
nmp = 0xfffffc000056cf50
rs = -4398040821936
mycpu = 5427584
rpb = 0xfffffc0000565a58
rpb_cpu = 0x1ea6b59
item_list = struct {
function = 436338788
out_flags = 1
in_flags = 0
rtn_status = 18446744069414584320
next_function = 0x376a1f1600000001
input_data = 6366207712153912073
output_data = 4981061741327700809
}
5 panic(0x0, 0x1f, 0x1a000000, 0x1a020064, 0x1)
["../../../../src/kernel/bsd/subr_prf.c":842, 0xfffffc0000284664]
6 machcheck(0x1, 0x0, 0x6c994f, 0x20000001a, 0xffffffff91457930)
["../../../../src/kernel/arch/alpha/hal/eb164.c":3096, 0xfffffc000040ff
a4]
7 mach_error(0x6c994f, 0x20000001a, 0xffffffff91457930,
0xfffffc0000006068, 0xfffffc00003dc9f0)
["../../../../src/kernel/arch/alpha/hal/
cpusw.c":1027, 0xfffffc00003f187c]
8 _XentInt(0x8, 0x3ff80008adc, 0x3ffc0008720, 0x3ffc000b900,
0x120003c48) ["../../../../src/kernel/arch/alpha/locore.s":1339,
0xfffffc00
003dc9ec]
_dump_end:
----------------------------------------------------------------------
__ __
/ //_ \ Volker Becker email: becker_at_eurocontrol.de
___/ // / Deutsche Flugsicherung www:
http://www.dfs.de
\ //__ \ Rintheimer Querallee 6 phone: +49-721-6903-326
\_//_____/ D-76131 Karlsruhe fax: +49-721-6903-247
----------------------------------------------------------------------
Received on Mon Jan 03 2000 - 11:05:53 NZDT