HP OpenVMS Systems

ask the wizard

Efficiency of shared access?

» close window

The Question is:

 
In general, would it be accurate to say that the more processes accessing an
 indexed RMS file, the less efficient I/O is? That is, if you had a few server
 processes accessing the file rather than many single processes directly
 accessing it, the same amoun
t of I/O would be completed more rapidly?
 
If so, why?
 
Perhaps related: In a Cluster environment, would the number of nodes in the
 Cluster have an effect on I/O if a device were accessible throughout the
 Cluster?

The Answer is :

    It depends on many things. For example, are these processes reading or
    writing the file? Is the access pattern uniform across all records or
    are there hot spots. Does the file have global buffers? How much
    contention is there for records?
 
    Consider one extreme, all processes are WRITING the same record on a
    file with global buffers. The first process to access the record reads
    the file, resulting in the bucket containing the record being placed in
    a global buffer. Subsequent requests for the same record are satisfied
    from the buffer. In a cluster environment, we need to coordinate
    buffers across nodes. In the same case with a "server" process model, the
    processes reading the record still need to communicate with the server
    - an I/O by any other name is still an I/O! Or in this case TWO I/Os as
    we need both a request and a response. There is also the overhead of
    context switching.
 
    At another extreme, consider all the processes WRITING the same
    record. The situation is similar, except that we may need to
    communicate cache coherency across the cluster.
 
    Now think about each of the processes reading random records. The
    effectiveness of the cache may be reduced, but why would the different
    models result in extra I/O to the file or reduce the "efficiency" of
    the I/O's? (whatever that means!)
 
    There are benefits in using the server process model for accessing data
    files - for example it gives more control over the data files in terms
    of security. It can also lead to more flexible application designs
    since the communication with the server can be implemented through any
    convenient transport mechanism. The downside of the server model is
    increased overall I/O and process management overhead, as every request
    results in a minimum of 3 I/Os, only one of which is a candidate for
    caching (send request, read data, send response), and two context
    switches.
 
    In a cluster environment you must also consider how the device is
    accessed. If it's MSCP served, then it IS a single process directly
    accessing the device already, but at a lower level.
 
    The Wizard would recommend that an application be designed with a data
    access layer which presents the data to the application in whatever form
    is convenient for the application. This layer can then be implemented as
    direct RMS access, or client/server, or a data base product or an
    in-memory data base etc... Don't limit the application to a specific
    physical implementation by exposing too much detail in the application
    logic. This approach allows the application to be written without
    making a choice. Different implementations can be tried and compared
    without affecting the application. For the same reason, it it much
    simpler to write an application which is portable across multiple
    platforms because all the system specific code is hidden in lower
    levels.

  
     
     answer written or last revised on ( 15-MAY-2001 )
     » close window