Hi all
Thanks to Philip, Tom and Alan; conclusion is one memory channel has
crapped out and failed over to the other one and it has nothing to do
with link aggregation. I confirmed this by going into the room and
looking at the LEDs on the back - the primary MC had one amber light
(bad), the secondary one had two green (good).
Opinion was divided as to whether this merited a service call. As the
system is still not live, I pulled the cluster down and ran mc_cable
and mc_diags from firmware. No problems reported. Booting the
cluster again brought everything back up fine, two pairs of green LEDs
and nothing bad in the logs. Putting it down to mc cables needed
reseating, construction in the machine room, and that it happened on
Friday 13th. At least we know the failover works...
John
______________________________ Reply Separator _________________________________
Subject: memory channel scary messages on 5.1A - link aggregation?
Author: speakmaj (speakmaj_at_mskcc.org) at Internet
Date: 6/16/2003 1:33 AM
Hello all
We are preparing to move our production database to an Oracle 9iRAC
cluster of GS80s running Tru64 5.1A PK4. The new cluster seems to be
running just fine and we have no complaints. Well, just one. Out of
the blue we notice the below in /var/adm/messages. It looks scary
although we didn't experience any problems (but then, we are not
running production yet...). There is nothing within about 24 hrs
either side of this snip of /var/adm/messages. The only vaguely
notable thing is that a day or so before we switched on link
aggregation on the NICs; after fiddling with the switches a bit, it
works fine. Does anyone think I should worry about this?
Thanks
John
--------------------------------------------------------------------
Jun 13 11:25:34 crdbds1 vmunix: rm_get_errcnt_lock failed on mchan1
ret 2. Clear and restart
Jun 13 11:25:34 crdbds1 last message repeated 5 times
Jun 13 11:25:34 crdbds1 vmunix: rmerror_int: mchan1 double failure
Jun 13 11:25:34 crdbds1 vmunix: rm: logical rail 0 moved from
phys_rail 0 offset 0 MB
Jun 13 11:25:34 crdbds1 vmunix: rm: to
phys_rail 1 offset 0 MB
Jun 13 11:25:34 crdbds1 vmunix: rm_state_change: mchan1 slot 1 offline
Jun 13 11:25:34 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
phys_rail 0 removed
Jun 13 11:25:34 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
phys_rail 0 (size 512 MB)
Jun 13 12:05:28 crdbds1 vmunix: rm_get_errcnt_lock failed on mchan1
ret 2. Clear and restart
Jun 13 12:05:28 crdbds1 last message repeated 5 times
Jun 13 12:05:28 crdbds1 vmunix: rmerror_int: mchan1 double failure
Jun 13 12:05:28 crdbds1 vmunix: rm_state_change: mchan1 slot 1 offline
Jun 13 12:05:28 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
phys_rail 0 removed
Jun 13 12:05:28 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
phys_rail 0 (size 512 MB)
Jun 13 18:15:32 crdbds1 vmunix: rm_get_errcnt_lock failed on mchan1
ret 2. Clear and restart
Jun 13 18:15:32 crdbds1 last message repeated 5 times
Jun 13 18:15:32 crdbds1 vmunix: rmerror_int: mchan1 double failure
Jun 13 18:15:32 crdbds1 vmunix: rm_state_change: mchan1 slot 1 offline
Jun 13 18:15:32 crdbds1 vmunix: rm primary: mchan1, hubslot = 0,
phys_rail 0 removed
=====================================================================
Please note that this e-mail and any files transmitted with it may be
privileged, confidential, and protected from disclosure under
applicable law. If the reader of this message is not the intended
recipient, or an employee or agent responsible for delivering this
message to the intended recipient, you are hereby notified that any
reading, dissemination, distribution, copying, or other use of this
communication or any of its attachments is strictly prohibited. If
you have received this communication in error, please notify the
sender immediately by replying to this message and deleting this
message, any attachments, and all copies and backups from your
computer.
Received on Wed Jun 18 2003 - 18:05:25 NZST