|
» |
|
|
|
Ask the Wizard Questions
Convert: RMS-F-CHK error
The Question is:
Hello, Mr. Wizard--
From time to time, I experience a problem with an
RMS indexed file (to create it we use something like:
$ convert/fdl=x.fdl x.seq x.ddf).
The error shows as an Unexpected RMS error on the
given file. Also, if you were to TYPE out the file, you
get an error like:
%TYPE-W-READERR, error reading DKB200:[CU.DATA]ORDLIN.DDF;696
-RMS-F-CHK, bucket format check failed for VBN = 20045
My understanding is that the file has one or more
corrupted records some where in the middle, or that the
indexing structure is hosed-up.
To date, we have used any one of several recovery
methods which entail extracting data from the beginning of
the file up to the corrupted record, floundering around
looking for a non-corrupted record further in the file
(like looking up a key after the corruption), extracting
from there to the end, and merge/converting the extracted
data back into a usable file.
Obviously, this results in a loss of data.
I'm wondering if you know of a utility in VMS or
a program of some sort that could be used to recover the
whole file without loosing data and messing around for
hours.
Thanks!
The Answer is:
> The error shows as an Unexpected RMS error on the
> given file. Also, if you were to TYPE out the file, you
> get an error like:
>
> %TYPE-W-READERR, error reading DKB200:[CU.DATA]ORDLIN.DDF;696
> -RMS-F-CHK, bucket format check failed for VBN = 20045
That's a corrupted file allright. In the VMS version you
mention there are no know software causes for this. For earlier
versions, you want to make sure the latests patches are applied.
Possible causes for this are HARDWARE errors in memory or disk
subsystem. Check the error logs! If the file is fragmented, an
other possible cause is an interupted IO due to a STOP/ID.
To verify this, try a DUMP/HEAD/BLOC=COUN=0 and try to establish
whether the bucket at the reported VBN (20045) is in a single
extend, or requires 'split IO' to multiple disk extends. (This
must ofcourse be done on the original corrupted file, not a copy!)
To better understand the corruption it generally helps to DUMP
the bucket in question and a block before and after it.
For example with a bucket size of 6 (check FDL) you'd ask for
$DUMP/BLOC=(STA:20044,COUN:8)
The checkbytes are the first byte in the first VBN of the bucket
and the last byte in the last. They must be equal and are incremented
each time RMS updates anything in the bucket. If they are 'off by one'
then apparently only a part of an IO update an update made it out
to disk (caching software problem?!). Perhaps a block is all zero?
Perhaps a block of data from on other file is found? If so: ANAL/DISK.
A good way to investigate is also:
$ANAL/RMS/INT... POSI/BUCK 20045... DOWN... NEXT...
>
> My understanding is that the file has one or more
>corrupted records some where in the middle, or that the
>indexing structure is hosed-up.
Not just the records... the bucket containing the record is hosed.
If the index structure is broken, TYPE would not have noticed and
a CONVERT would have restored the file.
> To date, we have used any one of several recovery
>methods which entail extracting data from the beginning of
>the file up to the corrupted record, floundering around
>looking for a non-corrupted record further in the file
That is a good approach. It can be slightly improved
using binary search for the next valid key , or a hint
from index key values found by ANAL/RMS/INT...
> I'm wondering if you know of a utility in VMS or
>a program of some sort that could be used to recover the
>whole file without loosing data and messing around for
>hours.
CONVERT and a program extracting records are the right
tools for the job. If this happens more than once, it becomes
critial to understand what is special to the file or the
operations done to it. This is NOT normal, it is an exception.
After some understanding of the file and buckets is build up,
Some fortunate folks manage to PATCH a file back to live.
It could be as simple as making those check bytes match, but
it is likely to be harder. If patching the check byte reveals
other corruptions, A good 'trick' sometimes is to patch the
'next free byte' offset word in the bucket header to point
just after the last known to be good record. Thus one hopes
to recover some records from the bucket.
For desperate corruption cases you'll need to get a log of
the operations done to the file. This can be done by analyzing
the process steps and the business data, by rigging up the
program to write timestamped trace records, by adding time-stamps
to the main records, or sometimes by borrowing RMS JOURNALING.
The AFTER IMAGE journal contains a picture of the changes made.
Bon Courage,
|