Hardware: ES40 6/500 4CPUS 3Gb RAM (it happened with 4GB as well)
Firmware: 5.8
Software: T64 Unix 5.1 2nd patch applied
WEBES V3.1 Build 12 09/28/2000 SP 1 Build 4 1 Dec 2000
File System: Advfs version 4
Problem: managing large data files (600Mb), data is changed
without any notice to the user
Dear friends,
This was supposed to be the summary of my mail having the subject
"gzip & gunzip not always returning original data" but I prefer to
"open" a new subject since it proved to be a different (and worst)
matter.
The probles is that, managing large data files (600Mb), data is changed
without any notice to the user.
A user of mine discovered the problem gzipping/gunzipping his large data
file: gunzip sometimes returned strange errors, while other times (not
always) the gunzipped data was different that the original data.
At the beginning, soon after the "gzip & gunzip not always returning
original data" mail I suspected a memory error detected by CA to be the
cause of the problem. Unfortunately the memory cards has been replaced,
CA does'nt see any hardware problem, but I still have strange undetected
data corruptions (even without gzip/gunzip).
I have to thank very much our doctor, Tom Blinn, for his very fast and
usefull help. Following his suggestion I found out that the problem was
NOT in gzip/gunzip since I get undetected data corruption even in the
following few lines of code. In it I repeatetely copy an input file
(../a) into file b and c and then I check differences among the 3 files
using "diff" and "cksum". Well, it happens that those differences sometimes really
occurs and that there are no noticeble warning or error message.
#!/bin/csh
unset verbose
set echo
echo pwd=`pwd`
uname -a
unlimit
limit
set n=0
set echo
loop:
_at_ n ++
echo " ==================================================== begin loop $n"
echo start loop n=$n at `date`
ls -ls ../a
cksum ../a
cp ../a b
cksum ../a b
cp b c
cksum ../a b c
ls -ls b c
diff ../a b >/dev/null || echo ERRROR 1: FILES ../a and b DIFFERS at loop $n
cksum ../a b c
diff b c >/dev/null || echo ERRROR 2: FILES b and c DIFFERS at loop $n
cksum ../a b c
diff c b >/dev/null || echo ERRROR 3: FILES c and b DIFFERS at loop $n
cksum ../a b c
diff ../a c >/dev/null || echo ERRROR 4: FILES ../ and c DIFFERS at loop $n
cksum ../a b c
diff c ../a >/dev/null || echo ERRROR 5: FILES c and ../a DIFFERS at loop $n
cksum ../a b c
diff ../a b >/dev/null || echo ERRROR 6: FILES ../a and b DIFFERS at loop $n
cksum ../a b c
diff b ../a >/dev/null || echo ERRROR 7: FILES b and ../a DIFFERS at loop $n
cksum ../a b c
diff b c >/dev/null || echo ERRROR 8: FILES b and c DIFFERS at loop $n
cksum ../a b c
echo end loop n=$n at `date`
echo " ==================================================== end loop $n"
goto loop
I run the above script using the file ../a which has the following:
ls -ls ../a
610032 -rw-r--r-- 1 root system 624672000 Jan 18 17:34 ../a
cksum ../a
2785050943 624672000 ../a
While I'm writing the script is running in background and here are the results obtained
up to now:
loop ERROR1 ERROR2 ERROR3 ERROR4 ERROR5 ERROR6 ERROR7 ERROR8
1 no no no no no no no no
2 no no no no no no no no
3 no no no no no no no no
4 YES YES YES no no YES YES YES
5 no no YES no no YES YES YES
6 no YES YES no no YES YES YES
7 no no no no no no no no
....
....
Of course, when some ERRORx occurs (that is some diff are found), the
cksum values of the files are not what expected (2785050943 as file
../a).
Now I kill the background job and I edit the script eliminating all the
"diff" commands. The script now contains only the following commands:
cp, ls, and cksum.
The results are ugly! The checksum of a given file often changes within
the same loop: the dimensions are always the same, but the contents of
files varies !!
To prove my words I submited the script in background placing stdout on
a log file. Look at the following which shows the resulting cksums (wich
should all be the same):
grep 624672000 CHECK_nogz.log | grep -v system | sort -u
1680138362 624672000 b
2046682359 624672000 c
2095653778 624672000 b
218351582 624672000 b
2371670479 624672000 c
2785050943 624672000 ../a
2785050943 624672000 b
2785050943 624672000 c
2992181696 624672000 b
3216358513 624672000 b
3442014270 624672000 c
What else to say ?
Please, help me!
Thanks to everybody,
Emanuele
--
$$$ Emanuele Lombardi
$$$ mail: AMB-GEM-CLIM ENEA Casaccia
$$$ I-00060 S.M. di Galeria (RM) ITALY
$$$ mailto:emanuele.lombardi_at_casaccia.enea.it
$$$ tel +39 06 30483366 fax +39 06 30483591
$$$
$$$ |||
$$$ \|/ ;_;
$$$ What does a process need | /"\
$$$ to become a daemon ? | \v/
$$$ | |
$$$ - a fork o---/!\---
$$$ | |_|
$$$ | _/ \_
$$$* Contrary to popular belief, UNIX is user friendly.
$$$ It's just very particular about who it makes friends with.
$$$* Computers are not intelligent, but they think they are.
$$$* True programmers never die, they just branch to an odd address
$$$* THIS TRANSMISSION WAS MADE POSSIBLE BY 100% RECYCLED ELECTRONS
Received on Tue Jan 30 2001 - 15:45:47 NZDT