HP OpenVMS Systemsask the wizard |
The Question is: Dear Mr. Wizard, We have an application handling user input via detached programs populating our database made of a bunch of RMS index/sequential files. It is a multi-user application working with several of these detached programs. The communication mechanism between the input program and the detached program, that we call "server" is going thru a RMS/sequential file where the input program write a reference number, we call that an event, and then the server "wakes up" and process t he event and stores the info into the RMS files, mark the event as processed and go back to HIB waiting for the next event. All this happen with the help of the so called "Lock manager". Unfortunately this doesn4t go without any problem and very often we get the sequential file, where all the events are written, corrupted. That is, we get the error : %RMS-F-IRC. illegal record encountered : VBN or record After this type of error we can just throw away the sequential file. We did check system log-files but we could not find any hints. We have absolutely no clue where it comes from. We have been unable to reproduce the error. With such a corruption we can not edit,convert nor dump the file in order to inspect the faulty record. We have check several time the application putting log mess ages to try to get to the cause but we were unlucky so far. We also wrote program to try to step over that corrupted record but none of our trick worked. It is a real killing. Is there any possibility that some RMS parameter/quotas/buffers might be wrongly set in SYSGEN or somewhere else ? Or what could happen when too many record update have to be done on the exact same record ? Could we get "data jam" of some sort ? We are all waiting anxiously for your answer. Thanks and regards, Jacques Roch jroch@access.ch The Answer is :
If an RMS sequential file gets corrupted when strictly being used
through RMS using a fairly recent VMS version it would be the
first time for one to run into a very serious problem with a
heavily used service. Possible, but unlikely.
In other words, RMS is like to be a VICTIM here, not a cause.
The last known cause was a lockmanager/vioc issue under 6.2, but
you indicate 6.1. Patch kits are available for this.
The last know RMS sequentual file corruption was before 6.0, but
you indicate 6.1.
I would take a close look at the storage used. All Digital qualified
components? Undesired data re-ordering (hardware/software caches)
could cause this.
The IRC error indicates that the RECORD-LENGTH word which precedes
the data got blown away. It may well be the PREVIOUS record
that had a inconsistent length, creating a pointer in raw data.
Or you may have been handed on old copy of the data from some cache
where RMS had told the (other) system to write out fresh data with
the real record length word. (Are you using 3rd party data
caching software?)
I'd encourage you to augment the error handler with a print out of
the alleged RFA for the problem record and the RFA for the last
succesfull one. TO study (and possible fix!) the problem, you'd want
to DUMP the VBNs those RFAs implicitly point to (and probably a few
blcoks before and after, enough to capture 5+ entire records.)
Do NOT delete or COPY the 'broken' file, but RENAME them just in
case there is something wrong with the disk block mapping (file
header). Also, you want to have a few files to study to establish
a 'pattern'. (in the pre 6.0 case we found there to be a 'hole'
in the written data where the EOF had been advanced, but no data
written.)
It _IS_ possible to have a programing error to create an IRC error,
but it requires (accidental) manipulation of the RAB$W_RFA fields.
If this application is using only straight RMS $GET, $PUT and $UPDATE,
does not manipulate RFAs, uses a recent VMS version and the system is
build with Digital qualified components, then you should urgent
escalate this problem through support just in case that you are the
first to be seeing a problem where others are suffering unknowingly.
|