![]() |
![]() HP OpenVMS Systemsask the wizard |
![]() |
The Question is: Dear Mr. Wizard, We have an application handling user input via detached programs populating our database made of a bunch of RMS index/sequential files. It is a multi-user application working with several of these detached programs. The communication mechanism between the input program and the detached program, that we call "server" is going thru a RMS/sequential file where the input program write a reference number, we call that an event, and then the server "wakes up" and process t he event and stores the info into the RMS files, mark the event as processed and go back to HIB waiting for the next event. All this happen with the help of the so called "Lock manager". Unfortunately this doesn4t go without any problem and very often we get the sequential file, where all the events are written, corrupted. That is, we get the error : %RMS-F-IRC. illegal record encountered : VBN or record After this type of error we can just throw away the sequential file. We did check system log-files but we could not find any hints. We have absolutely no clue where it comes from. We have been unable to reproduce the error. With such a corruption we can not edit,convert nor dump the file in order to inspect the faulty record. We have check several time the application putting log mess ages to try to get to the cause but we were unlucky so far. We also wrote program to try to step over that corrupted record but none of our trick worked. It is a real killing. Is there any possibility that some RMS parameter/quotas/buffers might be wrongly set in SYSGEN or somewhere else ? Or what could happen when too many record update have to be done on the exact same record ? Could we get "data jam" of some sort ? We are all waiting anxiously for your answer. Thanks and regards, Jacques Roch jroch@access.ch The Answer is : If an RMS sequential file gets corrupted when strictly being used through RMS using a fairly recent VMS version it would be the first time for one to run into a very serious problem with a heavily used service. Possible, but unlikely. In other words, RMS is like to be a VICTIM here, not a cause. The last known cause was a lockmanager/vioc issue under 6.2, but you indicate 6.1. Patch kits are available for this. The last know RMS sequentual file corruption was before 6.0, but you indicate 6.1. I would take a close look at the storage used. All Digital qualified components? Undesired data re-ordering (hardware/software caches) could cause this. The IRC error indicates that the RECORD-LENGTH word which precedes the data got blown away. It may well be the PREVIOUS record that had a inconsistent length, creating a pointer in raw data. Or you may have been handed on old copy of the data from some cache where RMS had told the (other) system to write out fresh data with the real record length word. (Are you using 3rd party data caching software?) I'd encourage you to augment the error handler with a print out of the alleged RFA for the problem record and the RFA for the last succesfull one. TO study (and possible fix!) the problem, you'd want to DUMP the VBNs those RFAs implicitly point to (and probably a few blcoks before and after, enough to capture 5+ entire records.) Do NOT delete or COPY the 'broken' file, but RENAME them just in case there is something wrong with the disk block mapping (file header). Also, you want to have a few files to study to establish a 'pattern'. (in the pre 6.0 case we found there to be a 'hole' in the written data where the EOF had been advanced, but no data written.) It _IS_ possible to have a programing error to create an IRC error, but it requires (accidental) manipulation of the RAB$W_RFA fields. If this application is using only straight RMS $GET, $PUT and $UPDATE, does not manipulate RFAs, uses a recent VMS version and the system is build with Digital qualified components, then you should urgent escalate this problem through support just in case that you are the first to be seeing a problem where others are suffering unknowingly.
|