HP OpenVMS Systems

ask the wizard

RMS illegal record (IRC) errors?

» close window

The Question is:

 
Dear Mr. Wizard,
 
We have an application handling user input via detached programs populating
our database made of a bunch of RMS index/sequential files. It is a
multi-user application working with several of these detached programs.
 
The communication mechanism between the input program and the detached
program, that we call "server" is going thru a RMS/sequential file where the
input program write a reference number, we call that an event, and then the
server "wakes up" and process t
he event and stores the info into the RMS files, mark the event as processed
and go back to HIB waiting for the next event. All this happen with the help
of the so called "Lock manager".
 
Unfortunately this doesn4t go without any problem and very often we get the
sequential file, where all the events are written, corrupted. That is, we
get the error :
 
%RMS-F-IRC. illegal record encountered : VBN or record
 
After this type of error we can just throw away the sequential file. We did
check system log-files but we could not find any hints.
 
We have absolutely no clue where it comes from. We have been unable to
reproduce the error. With such a corruption we can not edit,convert nor dump
the file in order to inspect the faulty record. We have check several time
the application putting log mess
ages to try to get to the cause but we were unlucky so far. We also wrote
program to try to step over that corrupted record but none of our trick
worked. It is a real killing.
 
Is there any possibility that some RMS parameter/quotas/buffers might be
wrongly set in SYSGEN or somewhere else ? Or what could happen when too many
record update have to be done on the exact same record ? Could we get "data
jam" of some sort ?
 
We are all waiting anxiously for your answer.
 
Thanks and regards,
Jacques Roch
jroch@access.ch

The Answer is :

 
    If an RMS sequential file gets corrupted when strictly being used
    through RMS using a fairly recent VMS version it would be the
    first time for one to run into a very serious problem with a
    heavily used service. Possible, but unlikely.
 
    In other words, RMS is like to be a VICTIM here, not a cause.
 
    The last known cause was a lockmanager/vioc issue under 6.2, but
    you indicate 6.1. Patch kits are available for this.
    The last know RMS sequentual file corruption was before 6.0, but
    you indicate 6.1.
 
    I would take a close look at the storage used. All Digital qualified
    components? Undesired data re-ordering (hardware/software caches)
    could cause this.
 
    The IRC error indicates that the RECORD-LENGTH word which precedes
    the data got blown away. It may well be the PREVIOUS record
    that had a inconsistent length, creating a pointer in raw data.
    Or you may have been handed on old copy of the data from some cache
    where RMS had told the (other) system to write out fresh data with
    the real record length word. (Are you using 3rd party data
    caching software?)
    I'd encourage you to augment the error handler with a print out of
    the alleged RFA for the problem record and the RFA for the last
    succesfull one. TO study (and possible fix!) the problem, you'd want
    to DUMP the VBNs those RFAs implicitly point to (and probably a few
    blcoks before and after, enough to capture 5+ entire records.)
 
    Do NOT delete or COPY the 'broken' file, but RENAME them just in
    case there is something wrong with the disk block mapping (file
    header). Also, you want to have a few files to study to establish
    a 'pattern'. (in the pre 6.0 case we found there to be a 'hole'
    in the written data where the EOF had been advanced, but no data
    written.)
 
    It _IS_ possible to have a programing error to create an IRC error,
    but it requires (accidental) manipulation of the RAB$W_RFA fields.
 
    If this application is using only straight RMS $GET, $PUT and $UPDATE,
    does not manipulate RFAs, uses a recent VMS version and the system is
    build with Digital qualified components, then you should urgent
    escalate this problem through support just in case that you are the
    first to be seeing a problem where others are suffering unknowingly.
 

  
     
     answer written or last revised on ( 28-SEP-1998 )
     » close window