SUMMARY:System crash on file size > 4G

From: Nair, Pravin <PNair_at_genecoop.com>
Date: Tue, 13 Feb 2001 12:23:55 -0500

Many thanks to everyone who replied .Almost all of you were right when you
pointed it to be a hardware/memory problem. One of the memory module was
faulty. Power on -reboot didn't identify it as bad.So ,we had to do a trial
and error removal/swapping of DIMM boards and do a copy of large file to
make the system crash.
 
Thanks again to all -

Sergui Patchkovskii
Bryan Lavalle
Udo Grabowski
Rowan Bailey
Alan Davis
Dr.Thomas Blin
Alan (nabeth.cxo.dec.xom)
Knut Helleb0

----------------------------------
 
> We have a ES40 system with Tru64 V5.0A , which crashes when we copy a
file
> which is more than 4G (atleast that is what it
> appears like ).This happened during an oracle export .A simple cp of >4G
> file to different partitions also crashes the system.
>
> Error message on halt is
> halt code =6
> double error halt
>
> /var/adm/message file has warning messages like :
>
> WARNING:too many processor corrected errors detected on cpu0.Reporting
> suspended
> WARNING:too many processor corrected errors detected on cpu1.Reporting
> suspended
>
>
> # ulimit -a
> core file size (blocks) unlimited
> data seg size (kbytes) 131072
> file size (blocks) unlimited
> max memory size (kbytes) 4102784
> open files 4096
> pipe size (512 bytes) 8
> stack size (kbytes) 8192
> cpu time (seconds) unlimited
> max user processes 1024
> virtual memory (kbytes) 4194304
>
> We tried changing parameters to unlimited,but still the system goes down
> while copying large files.
> Any help is highly appreciated.
>
> Thanks
> Pravin

Your system is failing due to a hardware problem, possibly due to bad
memory (too many processor corrected errors is usually reported due to
ECC fixes). It may be memory that is never accessed during "normal"
system operation. Using "ulimit" will have no effect on this if it's
a hardware problem. A "double error halt" means an error occured in
low level code while an error was being processed/handled/logged. It
is another indication of BAD HARDWARE. Get the hardware repaired.
Received on Tue Feb 13 2001 - 23:54:23 NZDT

This archive was generated by hypermail 2.4.0 : Wed Nov 08 2023 - 11:53:41 NZDT