I posted the question about the mystery uerf reports:
uerf on my machine (Digital Unix 4.0c workstation) returns huge amount
of entries like the one below:
********************************* ENTRY 1.
----- EVENT INFORMATION -----
EVENT CLASS ERROR EVENT
OS EVENT TYPE 199. CAM SCSI
SEQUENCE NUMBER 48926.
OPERATING SYSTEM DEC OSF/1
OCCURRED/LOGGED ON Wed Jun 17 07:10:41 1998
OCCURRED ON SYSTEM osiw1
SYSTEM ID x00070016
SYSTYPE x00000000
----- UNIT INFORMATION -----
CLASS x0000 DISK
SUBSYSTEM x0000 DISK
BUS # x000A
x0290 LUN x0
TARGET x2
Ken Krueger suggested, that they may be caused by a known bug in advfs.
I'm to try his workaround.
This may be due to a known bug in advfsd (the advfs daemon). This
daemon gets started each time the system boots (/sbin/init.d/advfsd)
and may begin to consume large amounts of cpu time as well as fill you
errorlogs. I think it goes out and does a disk check every 5 minutes.
Possible solutions are:
1) Remove the advfsd link in your /sbin/rc3.d directory so that the
daemon never starts. The problem with this is that if you use the advfs
GUI, you'll need to start the daemon first (/sbin/init.d/advfsd start)
and then stop it (/sbin/init.d/advfsd stop) when you are done.
2) Put an entry in crontab that does a stop every so often. The problem
with this is that it could kill the process while you are using the
GUI. Another problem is that while the daemon is running, it will put
many entries into the errorlog file.
3) Don't use the advfs GUI at all. Then step one can be used to destroy
the link and you don't have to worry about remembering to start or
stop it.
No word on when this will be fixed.
Ken
Ronald D.Bowman suggested, that the problem may be caused by some disk
error. I will do disk checking in a few days, when I'll be able to
shutdown my machine. He said the following:
The errors you are seeing seem to indicate(from the first one) that
there is aproblem with the hard disk drive. The second one seems to
indicate that maybe you are using dump to back up the disk, and
something
is not correct. I am not sure on this(and more qualified people will
probably answer the question better), but what may be happening is that
a file is in a sector that has a bad byte. The file is probably okay,
but dump reads every byte of every sector, and therefore would find a
problem bit. If you want to check the status of your disk, you can
use scu. there have been previous summarys(some by me) that explain
this.
here is part of one of mine:
In order to keep people from using the partition where the errors were
occurring, I unmounted it by using #umount /space where /space is the
partition in question(we knew this since it failed during the dump,
plus
this is the partition to which we had just added about 270 Meg of
files).
Also, in order to use fsck, the file system must be unmounted.
Then using #scu -f /dev/rrz0h, scu> verify media I found that 8 blocks
were unreadable. By accident, I discovered that just using verify
media
without any block information checked the entire disk(fortunately the
only
errors were in the /space partition).
Then scu was used to reassign the blocks that were unreadable:
scu> reassign lba #### where #### is the block number provided by
verify.
We could do this with little worry about what happened to our data
since
the partition in question has software that we have added(thus easy to
recover). Plus, the errors did not show up until this latest
installation
of software, and NSR gave us a list of 6 files that had I/O errors
associated with them. Therefore, more than likely the 6 files listed
by NSR were the only ones in jeopardy.
I then re-ran the scu verify command as scu>verify media starting ####
where #### was a block number starting just before the first reported
error. No errors were reported, so remounted /space using #mount
/space.
---
You may not have to unmount the file system, but it probably would not
hurt
to do it if you can. man on scu may provide info on that.
also, in scu you can use "show defects all" to get a listing of all
logged
defects with the disk. "show defects grown" will give a listing of the
defects that have been found since the disk was formatted. Also, I
believe
that sectors that are currently bad and containing a file will not be
put
on the grown list until the file is removed from that sector.
I hope this helps, or if it is way off base of the actual problem that
it provides you some information that you do not already know.
Received on Thu Jul 16 1998 - 16:10:25 NZST