Dear all,
this DS20E is turning into a nightmare... to recap:
DS20E SMP 667MHz
Firmware 5.8-10
1Gb RAM
KZPAC-CA driving 2x 9Gb disks (mirrored system)
KZPBA-CY driving an external RA310 and a TLZ88
Intel EtherExpress 100 (ee*, DE600) NIC
4.0F + NHDv3 + PK#4
100% AdvFS
So, after last night's swap of the Single-Ended SCSI card with the
Differential spare I happened to have the box started doing real work
until it decided to simply hang. No error messages, no nothing,
symptoms:
1) All NFS mounts hanging ("NFS server not responding"),
2) system responds to ping but no logins possible,
3) only access via serial console
As a matter of fact 3) turned out to be weird. At the time the system
was lightly loaded (one large compile job, some Samba sessions, an
Apache server and Postgres, mostly idle) but trying to kill off any of
the I/O locked jobs proved impossible - access to local filesystems
was fine but kill -9 <anything> didn't have any effect whatsoever. A
shutdown -h now just stuck there killing the console.
So, walk over to the box:
1) Halt -> no effect,
2) Plug serial console directly into VT510 -> nothing, dead,
3) Halt -> no effect,
4) Halt -> ah, I'm back
I get the output on screen from the shutdown and then the >> prompt.
At which point I issued a "crash" command to try and work out what had
been going on. Accordingly the stack trace has (abbreviated version,
for full version mail me pls.):
thread 0xfffffc003ee0b880 stopped at [tcp_slowtimo:553 ,0xfffffc000038a54c]
Source not available
_crashtime: struct {
tv_sec = 981458742
tv_usec = 528961
}
_boottime: struct {
tv_sec = 981440073
tv_usec = 123446
}
_config: struct {
sysname = "OSF1"
nodename = "emily"
release = "V4.0"
version = "1229"
machine = "alpha"
}
_cpu: 57
_system_string: 0xffffffffff800b30 = "COMPAQ AlphaServer DS20E 666 MHz"
_ncpus: 2
_avail_cpus: 2
_partial_dump: 1
_physmem(MBytes): 1023
_panic_string: 0xfffffc000058f569 = "hardware restart"
_paniccpu: 0
_panic_thread: (nil)
_preserved_message_buffer_begin:
struct {
hdr = struct {
msg_magic = 0x880524
msg_bufx = 0x868
msg_bufr = 0x868
msg_size = 0x3fe0
}
msg_bufc = "Alpha boot: available memory from 0x1dd4000 to 0x3ff56000
[...]
_preserved_message_buffer_end:
_kernel_process_status_begin:
PID COMM
00000 kernel idle
00001 init
00031 update
00392 nfsd
06396 bash
07320 httpd
07755 bash
07839 bash
11185 smbd
11186 smbd
11202 smbd
11213 smbd
11314 halt
11317 umount
_kernel_process_status_end:
_current_pid: 0
_current_tid: 0xfffffc003ee0b880
_proc_thread_list_begin:
thread 0xfffffc003ee0b880 stopped at [tcp_slowtimo:553 ,0xfffffc000038a54c]
Source not available
thread 0xfffffc003ed4e000 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed4ea80 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed4e700 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed4e380 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed4f500 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed4f180 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed4ee00 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed4fc00 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed4f880 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed84000 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed84700 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed85500 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed85180 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed84e00 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed85c00 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003ed85880 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003e04a000 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003e04aa80 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003e04a700 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc003e04a380 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc0035945180 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc0035944e00 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc0035917c00 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc0035916a80 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc0035917500 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
thread 0xfffffc00358d7500 stopped at [thread_block:2367 ,0xfffffc00002bbfc0]
Source not available
[...]
> 0 tcp_slowtimo(0xffffffffffffffff, 0xffffffffffffffff, 0xffffffffffffffff, 0x
ffffffffffffffff, 0xffffffffffffffff) ["../../../../src/kernel/netinet/tcp_timer
.c":553, 0xfffffc000038a54c]
_dump_end:
_kernel_thread_list_begin:
thread 0xfffffc003ee0b880 stopped at [tcp_slowtimo:553 ,0xfffffc000038a54c]
Source not available
[...]
}
tset machine_slot[paniccpu].cpu_panic_thread:
Begin Trace for machine_slot[paniccpu].cpu_panic_thread:
warning: cannot get register (number = 64)
thread 0x0 stopped at
warning: cannot get register (number = 64)
warning: cannot get register (number = 64)
warning: PC value 0x0 not valid, trying RA
warning: cannot get register (number = 26)
warning: RA value 0x0 not valid, trying text start
>
warning: cannot get register (number = 64)
[setup_main:1147, 0xfffffc0000230000] lda sp, -32(sp)
warning: cannot get register (number = 64)
warning: cannot get register (number = 26)
warning: cannot get register (number = 30)
0 setup_main() ["../../../../src/kernel/bsd/init_main.c":1147, 0xfffffc000023
0000]
End Trace for machine_slot[paniccpu].cpu_panic_thread:
thread 0xfffffc003ee0b880 stopped at [tcp_slowtimo:553 ,0xfffffc000038a54c]
Source not available
thread 0xfffffc003ee0b880 stopped at [tcp_slowtimo:553 ,0xfffffc000038a54c]
Source not available
_stack_trace[0]_begin:
[...]
and on and on...
Now, I see two possibilities: either it is a red herring and the hang
has nothing to do with it _or_ Samba has done something very bad to
the system.
I am considering the following routes:
1) replacing the DE600 with a known-good DE500 card just in case the
NHDv3 was actually a Bad Idea(tm),
2) Upgrading firmware to 5.9 off the web site.
Any better ideas?
Thanks,
Arrigo
--
Arrigo Triulzi <arrigo_at_albourne.com>
Albourne Partners Ltd. - London, UK
Received on Tue Feb 06 2001 - 12:38:01 NZDT