HP OpenVMS Systemsask the wizard |
The Question is: In general, would it be accurate to say that the more processes accessing an indexed RMS file, the less efficient I/O is? That is, if you had a few server processes accessing the file rather than many single processes directly accessing it, the same amoun t of I/O would be completed more rapidly? If so, why? Perhaps related: In a Cluster environment, would the number of nodes in the Cluster have an effect on I/O if a device were accessible throughout the Cluster? The Answer is :
It depends on many things. For example, are these processes reading or
writing the file? Is the access pattern uniform across all records or
are there hot spots. Does the file have global buffers? How much
contention is there for records?
Consider one extreme, all processes are WRITING the same record on a
file with global buffers. The first process to access the record reads
the file, resulting in the bucket containing the record being placed in
a global buffer. Subsequent requests for the same record are satisfied
from the buffer. In a cluster environment, we need to coordinate
buffers across nodes. In the same case with a "server" process model, the
processes reading the record still need to communicate with the server
- an I/O by any other name is still an I/O! Or in this case TWO I/Os as
we need both a request and a response. There is also the overhead of
context switching.
At another extreme, consider all the processes WRITING the same
record. The situation is similar, except that we may need to
communicate cache coherency across the cluster.
Now think about each of the processes reading random records. The
effectiveness of the cache may be reduced, but why would the different
models result in extra I/O to the file or reduce the "efficiency" of
the I/O's? (whatever that means!)
There are benefits in using the server process model for accessing data
files - for example it gives more control over the data files in terms
of security. It can also lead to more flexible application designs
since the communication with the server can be implemented through any
convenient transport mechanism. The downside of the server model is
increased overall I/O and process management overhead, as every request
results in a minimum of 3 I/Os, only one of which is a candidate for
caching (send request, read data, send response), and two context
switches.
In a cluster environment you must also consider how the device is
accessed. If it's MSCP served, then it IS a single process directly
accessing the device already, but at a lower level.
The Wizard would recommend that an application be designed with a data
access layer which presents the data to the application in whatever form
is convenient for the application. This layer can then be implemented as
direct RMS access, or client/server, or a data base product or an
in-memory data base etc... Don't limit the application to a specific
physical implementation by exposing too much detail in the application
logic. This approach allows the application to be written without
making a choice. Different implementations can be tried and compared
without affecting the application. For the same reason, it it much
simpler to write an application which is portable across multiple
platforms because all the system specific code is hidden in lower
levels.
|