Hello,
I'm having troubles with a PCI SWXCR (3-port KZPSC) controller.
The server is the 2CPU AS2100 5/300 running du4.0d (patch set 1
installed).  The firmware is v5.1 and on SWXCR it is v2.36.
There are 3 groups each with 3 RZ28M-VW in a RAID 5 configuration
and one group of 2 RZ28M-VW mirrored.
Everything started 15 days after I upgraded from du4.0b to 4.0d
and applied patch set 1.  Sometimes, the SWXCR controller stops
responding but it doesn't happen too often and (so far) doesn't
have catastrophic consequences--filesystems on RAID become
unavailable and the only remedy is to reboot.  Since I moved
the system disk to the SWXCR if the filesystem rendered
inaccessible is this one than the machine panics and reboots
(not surprising).  The binary.errlog file contains
records on this and there will be a typical excerpt attached.
Most of this happens around 4am which looked to me pretty
odd, but this is what I've found in root's crontab:
----------------------------------------
1 4 * * * test -x /usr/sbin/defragcron && /usr/sbin/defragcron -p >>/usr/adm/defragcron.log 2>&1
----------------------------------------
This says to defragment all mounted AdvFS in parallel, so, there
has been indeed a lot of activity early in the morning.  I changed this
so that no more than two filesystems are defragmented.  However,
that didn't make the problem go away.
Recently I moved boot from rz0 to RAID and the
same thing happened during (from single user mode):
# vdump -0 -f - /usr | vrestore -x -f - -D /mnt/usr
(dump from internal SCSI to a RAID group).
It looks like the KZPSC can not stand a lot of activity from a
couple of 5/300 alpha CPUs.
Has anybody seen/resolved this?  Anybody out there having a
stable (and pretty fast and I/O demanding) alpha with this kind of
RAID controller?  I read a couple of good summaries from the
archive, but it seems that nobody came to firm conclusions about it.
Sorry for such a long message.  However, there will be yet another
posting which may have something to do with this afair.
Thanks for your time.
Sincerely,
Dejan Muhamedagic   dejan_at_yunix.co.yu
******************************** ENTRY   12 ******************************** 
Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            56. 
Timestamp of occurrence              27-SEP-1998 04:04:20   
Host name                            panda 
System type register      x00000009  AlphaServer 2x00 
Number of CPUs (mpnum)    x00000002 
CPU logging event (mperr) x00000000 
Event validity                    1. O/S claims event is valid 
Event severity                    1. Severe Priority 
Entry type                      198. SWXCR RAID Controller Event 
------ Device Data ------              
Class                           x00  RAID Disk 
Subsystem                       x20  SWXCR Mport/RAID Controller 
Number of Packets                 5. 
------ Packet Type ------       258. Module Name String 
Routine Name                         re_flush 
------ Packet Type ------       256. Generic String 
                                     Cmd rejected by port 
------ Packet Type ------       259. Software Error String 
Error Type                           Possible Software Problem - Impossible 
                                     Cond Detected 
------ Packet Type ------       256. Generic String 
                                     Active XCR_COM at time of error 
------ Packet Type ------         0. SWXCR Communication Block (XCR_COM) 
   Packet Revision                1. 
Controller Number         x00000000 
Unit Number on Controller x00000000 
Function Status Codes     x00000003  Command has Timed Out. 
Adapters Status Code          x0000  Normal Completion. Configuration 
                                     transferred. 
SWXCR Flags               x00000000 
Received by Callback      x00000000 
Data Xfer Length                  0. 
Number of Scatter Entries         0. 
Command Data Length               0. 
Block Number              x00000000 
Xfer Residual Length              0. 
Timeout Value in Seconds        120. 
XCR Command               x0000000A  Clear Cache of Dirty Blocks (Type 1). 
******************************** ENTRY   13 ******************************** 
Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            55. 
Timestamp of occurrence              27-SEP-1998 04:04:19   
Host name                            panda 
System type register      x00000009  AlphaServer 2x00 
Number of CPUs (mpnum)    x00000002 
CPU logging event (mperr) x00000000 
Event validity                    1. O/S claims event is valid 
Event severity                    5. Low Priority 
Entry type                      206. Advanced File System (AdvFS) Domain Panic 
SWI Minor class                   9. ASCII Message 
SWI Minor sub class               4. Informational 
ASCII Message 
    AdvFS Domain Panic; Domain usre_domain Id 0x35fcf50d.00095960 
    An AdvFS domain panic has occurred due to either a metadata write error or 
    an internal inconsistency. This domain is being rendered inaccessible. 
    Please refer to guidelines in AdvFS Guide to File System Administration 
    regarding what steps to take to recover this domain. 
      
******************************** ENTRY   14 ******************************** 
Logging OS                        2. Digital UNIX 
System Architecture               2. Alpha 
Event sequence number            54. 
Timestamp of occurrence              27-SEP-1998 04:04:19   
Host name                            panda 
System type register      x00000009  AlphaServer 2x00 
Number of CPUs (mpnum)    x00000002 
CPU logging event (mperr) x00000000 
Event validity                    1. O/S claims event is valid 
Event severity                    3. High Priority 
Entry type                      198. SWXCR RAID Controller Event 
------ Device Data ------              
Class                           x00  RAID Disk 
Subsystem                       x20  SWXCR Mport/RAID Controller 
Number of Packets                 7. 
------ Packet Type ------       258. Module Name String 
Routine Name                         re_complete 
------ Packet Type ------       256. Generic String 
                                     I/O failed 
------ Packet Type ------       260. Hardware Error String 
Error Type                           Hard Error Detected 
------ Packet Type ------       256. Generic String 
                                     Active XCR_COM at time of error 
------ Packet Type ------         0. SWXCR Communication Block (XCR_COM) 
   Packet Revision                1. 
Controller Number         x00000000 
Unit Number on Controller x00000003 
Function Status Codes     x00000003  Command has Timed Out. 
Adapters Status Code          x0000  Normal Completion. 
SWXCR Flags               x00000010  BP Points to Buffer. 
Received by Callback      x00000001 
Data Xfer Length               8192. 
Number of Scatter Entries         0. 
Command Data Length               0. 
Block Number              x00215D70 
Xfer Residual Length              0. 
Timeout Value in Seconds         60. 
XCR Command               x00000003  Write (Type 1). 
------ Packet Type ------       256. Generic String 
                                     Active Controller Working Set at time of 
                                     error 
------ Packet Type ------         1. Controller/HBA Working Set(CNTRL_WS) 
   Packet Revision                1. 
General Flags             x00000000 
Command Retry Count               0. 
160. Bytes Scatter/Gather            ** Not Printed ** 
Mask Register             x00000FFF 
     -  Registers 0->F -               
Opcode                          x03  Write (Type 1). 
Command ID                      x00 
Count of Blocks                  16. 
Start Block Number        x00215D70 
Logical Drive                   x03 
Pointer                   x00000000 
Scatter-Gather Type             x00  Unused. 
Command ID                      x00  Unused. 
Adapters Status Code          x0000  Normal Completion. 
Received on Tue Oct 13 1998 - 12:45:21 NZDT