Hello Managers,
Our ES40 has not rebooted in over two days under a load, unless I jinx
myself by posting this, I'd like to thank the following people for their
replies and help:
Marco Benton, Larry Clegg, Joe Carrio, Robert Aldridge, Dr. Tom Blinn,
Selden E Ball Jr, and Raul Sossa S.
I realize in my original post, I did not mention clearly that we aborted
the dual 667Mhz CPU upgrade after realizing that we didn't have the
minimum revision. The only things really changed was the firmware and
the memory.
There were many suggestions that led to many many possible culprits:
- Possible arrangement and mixture of 2 Gig kits and 4 Gig kit memory
- Possible incompatibility with different memory manufacturers
- The v5.9 firmware was buggy and superseded by v5.9B was mentioned by
several people
- Poorly seated memory and/or static electricity damage
- reading the Compaq Field Change Order Instructions that the technician
brought, it said, "ES40 EV6 systems with Rev D system motherboards are
experiencing intermittent processor hangs and memory errors."
I know that making many upgrades at the same time makes it difficult to
figure out if a single item is faulty, unfortunately, we don't have the
luxury of downing the machine each time and testing out changes. So we
schedule for one day of downtime and do everything at once.
After posting the message we thought that this was a good a time as any
to ask a Compaq field tech to come over to replace the motherboard. It
was free as specified by the installation instructions. The technician
came over replaced the motherboard and popped in our two 667 MHz CPUs.
The next day it rebooted itself again.
Then we decided to remove all the older 2 Gigabyte kit memory and leave
only the 4 Gigabyte kit memory in. Leaving us with a total of 16
Gigabytes. That's where we are now. The uptime is 2 days and 6 hours
which is longer than before.
We mentioned our situation to our vendor and he suggested that I speak
to tech support at Dataram, the memory manufacturer, because the vendor
AND Dataram believe that the 2 Gig kits and 4 gig kits should live
harmoniously.
Thanks to everyone for their help.
###ORIGINAL POST and ADDENDUM###
> We are having serious problems here, so I'm asking for help. Almost two
> weeks ago, we planned to upgrade our ES40 (Tru64 4.0f) by adding 16
> megabytes of RAM and changing the single 500 Mhz CPU (EV6) to dual 667
> Mhz (EV67). After upgrading the firmware to 5.9 (from v.5.7) level
> which was included on CD with the CPUs, we discovered that our
> motherboard did not satisfy the minimum Revision E0* specified by the
> installation guide. Disappointed but not deterred, we finished by
> putting in the 16 Megs of memory. It originally had 7 megabytes of
> memory, and we arranged it by pulling out four 256M sticks and leaving
> in twelve 512M sticks, then adding in sixteen 1024M sticks for a total
> of 22 Gigabytes of memory (I hope that make sense, I can get the array
> arrangement if someone needs it).
>
> Well after that, we have had constant crash/reboots. Over five this
> weekend alone. uerf -R now seems like it's broken, it ends with "Error
> reading syserr file". I'm not even sure how to fix that now. The
> crash-dump file seems to report "Processor Machine Check" on the
> _Panic_string.
>
> I was told that one of the work jobs has a possible memory leak and is
> using up all 22 megs of memory, plus 20 Gigs swap space. But the
> machine even crashes on idle a couple of times.
>
> I don't want to attach the crash-dump file, and force everyone to read
> it. But if someone is willing to look, I'll send it.
>
> I'm not sure where to begin to isolate the problem. Could it be bad
> memory? I will run memx in a little bit. Could it be the firmware
> upgrade that we did? Could it be some user's process using up all the
> memory and crashing the machine? Is the unusual mix of 512Meg and 1G
> memory causing unstability? We are also fearing static electricity
> damage on some hardware.
>
> If I'm not providing enough information and someone is willing to help,
> I'll be happy to provide some answers.
###ADDENDUM###
> We originally had 7 GIGAbytes, not MEGAbytes. We added in 16 Gigabytes,
> and removed 1 Gigabyte from the original set. Now the ES40 has 22
> Gigabytes of memory. It also has 20 Gigs of swap space.
--
Kevin Dea
UNIX System Administrator
Alpine Electronics Research of America
Received on Fri Jun 29 2001 - 01:09:47 NZST