Hi all
     
     Thanks to Philip, Tom and Alan; conclusion is one memory channel has 
     crapped out and failed over to the other one and it has nothing to do 
     with link aggregation.  I confirmed this by going into the room and 
     looking at the LEDs on the back - the primary MC had one amber light 
     (bad), the secondary one had two green (good).
     
     Opinion was divided as to whether this merited a service call.  As the 
     system is still not live, I pulled the cluster down and ran mc_cable 
     and mc_diags from firmware.  No problems reported.  Booting the 
     cluster again brought everything back up fine, two pairs of green LEDs 
     and nothing bad in the logs.  Putting it down to mc cables needed 
     reseating, construction in the machine room, and that it happened on 
     Friday 13th.  At least we know the failover works...
     
     John
______________________________ Reply Separator _________________________________
Subject: memory channel scary messages on 5.1A - link aggregation?
Author:  speakmaj (speakmaj_at_mskcc.org) at Internet
Date:    6/16/2003 1:33 AM
     Hello all
     
     We are preparing to move our production database to an Oracle 9iRAC 
     cluster of GS80s running Tru64 5.1A PK4.  The new cluster seems to be 
     running just fine and we have no complaints.  Well, just one.  Out of 
     the blue we notice the below in /var/adm/messages.  It looks scary 
     although we didn't experience any problems (but then, we are not 
     running production yet...).  There is nothing within about 24 hrs 
     either side of this snip of /var/adm/messages.  The only vaguely 
     notable thing is that a day or so before we switched on link 
     aggregation on the NICs; after fiddling with the switches a bit, it 
     works fine.  Does anyone think I should worry about this?
     
     Thanks
     John
     --------------------------------------------------------------------
     
     Jun 13 11:25:34 crdbds1 vmunix: rm_get_errcnt_lock failed on mchan1 
     ret 2. Clear and restart
     Jun 13 11:25:34 crdbds1 last message repeated 5 times
     Jun 13 11:25:34 crdbds1 vmunix: rmerror_int: mchan1 double failure 
     Jun 13 11:25:34 crdbds1 vmunix: rm: logical rail 0 moved from 
     phys_rail 0 offset 0 MB
     Jun 13 11:25:34 crdbds1 vmunix: rm:                      to 
     phys_rail 1 offset 0 MB
     Jun 13 11:25:34 crdbds1 vmunix: rm_state_change: mchan1 slot 1 offline 
     Jun 13 11:25:34 crdbds1 vmunix: rm primary: mchan1, hubslot = 0, 
     phys_rail 0 removed
     Jun 13 11:25:34 crdbds1 vmunix: rm primary: mchan1, hubslot = 0, 
     phys_rail 0 (size 512 MB)
     Jun 13 12:05:28 crdbds1 vmunix: rm_get_errcnt_lock failed on mchan1 
     ret 2. Clear and restart
     Jun 13 12:05:28 crdbds1 last message repeated 5 times
     Jun 13 12:05:28 crdbds1 vmunix: rmerror_int: mchan1 double failure
     Jun 13 12:05:28 crdbds1 vmunix: rm_state_change: mchan1 slot 1 offline 
     Jun 13 12:05:28 crdbds1 vmunix: rm primary: mchan1, hubslot = 0, 
     phys_rail 0 removed
     Jun 13 12:05:28 crdbds1 vmunix: rm primary: mchan1, hubslot = 0, 
     phys_rail 0 (size 512 MB)
     Jun 13 18:15:32 crdbds1 vmunix: rm_get_errcnt_lock failed on mchan1 
     ret 2. Clear and restart
     Jun 13 18:15:32 crdbds1 last message repeated 5 times
     Jun 13 18:15:32 crdbds1 vmunix: rmerror_int: mchan1 double failure
     Jun 13 18:15:32 crdbds1 vmunix: rm_state_change: mchan1 slot 1 offline 
     Jun 13 18:15:32 crdbds1 vmunix: rm primary: mchan1, hubslot = 0, 
     phys_rail 0 removed
     
     
     
     =====================================================================
     
     Please note that this e-mail and any files transmitted with it may be 
     privileged, confidential, and protected from disclosure under 
     applicable law. If the reader of this message is not the intended 
     recipient, or an employee or agent responsible for delivering this 
     message to the intended recipient, you are hereby notified that any 
     reading, dissemination, distribution, copying, or other use of this 
     communication or any of its attachments is strictly prohibited.  If 
     you have received this communication in error, please notify the 
     sender immediately by replying to this message and deleting this 
     message, any attachments, and all copies and backups from your 
     computer.
     
     
     
Received on Wed Jun 18 2003 - 18:05:25 NZST