HP OpenVMS Systems Documentation

Volume Shadowing for OpenVMS

6.2 Copy Operations

The purpose of a copy operation is to duplicate data on a source disk to a target disk. At the end of a copy operation, both disks contain identical information, and the target disk becomes a complete member of the shadow set. Read and write access to the shadow set continues while a disk or disks are undergoing a copy operation.

The DCL command MOUNT initiates a copy operation when a disk is added to an existing shadow set. A copy operation is simple in nature: A source disk is read and the data is written to the target disk. This is usually done in multiple block increments referred to as LBN ranges. In an OpenVMS Cluster environment, all systems that have the shadow set mounted know about the target disk and include it as part of the shadow set. However, only one of the OpenVMS systems actually manages the copy operation.

Two complexities characterize the copy operation:

Handling user I/O requests while the copy operation is in progress
Dealing with writes to the area that is currently being copied without losing the new write data

Volume Shadowing for OpenVMS handles these situations differently depending on the operating system version number and the hardware configuration. For systems running software earlier than OpenVMS Version 5.5--2, the copy operation is performed by an OpenVMS node and is known as an unassisted copy operation (see Section 6.2.1).

With Version 5.5--2 and later, the copy operation includes enhancements for shadow set members that are configured on controllers that implement new copy capabilities. These enhancements enable the controllers to perform the copy operation and are referred to as assisted copies (see Section 6.2.2).

Volume Shadowing for OpenVMS supports both assisted and unassisted shadow sets in the same cluster. Whenever you create a shadow set, add members to an existing shadow set, or boot a system, the shadowing software reevaluate's each device in the changed configuration to determine whether the device is capable of supporting the copy assist.

6.2.1 Unassisted Copy Operations

Unassisted copy operations are performed by an OpenVMS system. The actual transfer of data from the source member to the target is done through host node memory. Although unassisted copy operations are not CPU intensive, they are I/O intensive and consume a small amount of CPU bandwidth on the node that is managing the copy. An unassisted copy operation also consumes interconnect bandwidth.

On the system that manages the copy operation, user and copy I/Os compete evenly for the available I/O bandwidth. For other nodes in the cluster, user I/Os proceed normally and contend for resources in the controller with all the other nodes. Note that the copy operation may take longer as the user I/O load increases.

The volume shadowing software performs an unassisted copy operation when it is not possible to use the assisted copy feature (see Section 6.2.2). The most common cause of an unassisted copy operation is when the source and target disk or disks are not on line to the same controller subsystem. For unassisted copy operations, two disks can be active targets of an unassisted copy operation simultaneously, if the members are added to the shadow set on the same command line. Disks participating in an unassisted copy operation may be on line to any controller anywhere in a cluster.

During any copy operation, a logical barrier is created that moves across the disk, separating the copied and uncopied LBN areas. This barrier is known as a copy fence. The node that is managing the copy operation knows the precise location of the fence and periodically notifies the other nodes in the cluster of the fence location. Thus, if the node performing the copy operation shuts down, another node can continue the operation without restarting at the beginning.

Read I/O requests to either side of the copy fence are serviced only from a source shadow set member.

Write I/O requests before or at the fence are issued in parallel to all members of the shadow set.

Write I/O requests, after the fence, are completed first to source members, then to copy target members.

The time and amount of I/O required to complete an unassisted copy operation depends heavily on the similarities of the data on the source and target disks. It can take at least two and a half times longer to copy a member containing dissimilar data than it does to complete a copy operation on a member containing similar data.

6.2.2 Assisted Copy Operations

Unlike an unassisted copy, an assisted copy does not transfer data through the host node memory. The actual transfer of data is performed within the controller, by direct disk-to-disk data transfers, without having the data pass through host node memory. Thus, the assisted copy decreases the impact on the system, the I/O bandwidth consumption, and the time required for copy operations.

Shadow set members must be accessed from the same controller in order to take advantage of the assisted copy. The shadowing software controls the copy operation by using special MSCP copy commands, called disk copy data (DCD) commands, to instruct the controller to copy specific ranges of LBNs. For an assisted copy, only one disk can be an active target for a copy at a time.

For OpenVMS Cluster configurations, the node that is managing the copy operation issues an MSCP DCD command to the controller for each LBN range. The controller then performs the disk-to-disk copy, thus avoiding consumption of interconnect bandwidth.

By default, the Volume Shadowing for OpenVMS software (beginning with OpenVMS Version 5.5--2) and the controller automatically enable the copy assist if the source and target disks are accessed through the same HSC or HSJ controller.

Shadowing automatically disables the copy assist if:

The source and target disks are not accessed using the same controller.
In the case of dual-ported disks, you can use the $QIO SET PREFERRED PATH feature to force both disks to be accessed via the same controller. See the PREFER program in SYS$EXAMPLES and refer to the OpenVMS I/O User's Reference Manual for more information about setting a preferred path.
The shadow set is mounted on a controller that does not support the copy assist.
The shadow set member is mounted on an HSC controller with the copy assist disabled. (HSC controllers are the only controllers on which you can disable copy assists.)
The number of assisted copies specified by the DCD connection limit, only on HSC controllers, has been reached, at which point additional copies will be performed unassisted.

See Section 6.4 for information about disabling and reenabling the assisted copy capability.

6.3 Merge Operations

The purpose of a merge operation is to compare data on shadow set members and to ensure that inconsistencies are resolved. A merge operation is initiated if either of the following events occurs:

A system failure results in the possibility of incomplete writes.
For example, if a write request is made to a shadow set but the system fails before a completion status is returned from all the shadow set members, it is possible that:
- All members might contain the new data.
- All members might contain the old data.
- Some members might contain new data and others might contain old data.
The exact timing of the failure during the original write request defines which of these three scenarios results. When the system recovers, however, it is essential that corresponding LBNs on each shadow set member contain the same data (old or new). Thus, the issue here is not one of data availability, but rather of reconciling potential differences among shadow set members. Once the data on all disks is made identical, application data can be reconciled, if necessary, either by the user reentering the data or by database recovery and application journaling techniques.
If a shadow set enters mount verification with outstanding write I/O in the internal queue, and the problem is not corrected before mount verification times out, the systems on which the timeout occurred cause the shadow set to require a merge operation.
For example, if the shadow set were mounted on eight nodes and mount verification timed out on two of them, those two nodes would each cause the shadow set to require a merge operation.

The merge operation is managed by one of the OpenVMS systems that has the shadow set mounted. The members of a shadow set are physically compared to each other to ensure that they contain the same data. This is done by performing a block-by-block comparison of the entire volume. As the merge proceeds, any blocks that are different are made the same --either both old or new---by means of a copy operation. Because the shadowing software does not know which member contains newer data, any full member can be the source member of the merge operation.

The shadowing software always selects one member as a logical master for any merge operation, across the OpenVMS Cluster. Any difference in data is resolved by a propagation of the information from the merge master to all the other members.

The system responsible for doing the merge operation on a given shadow set, updates the merge fence for this shadow set after a range of LBNs is reconciled. This fence "proceeds" across the disk and separates the merged and unmerged portions of the shadow set.

Application read I/O requests to the merged side of the fence can be satisfied by any source member of the shadow set. Application read I/O requests to the unmerged side of the fence are also satisfied by any source member of the shadow set; however, any potential data differences---discovered by doing a data compare operation---are corrected on all members of the shadow set before returning the data to the user or application that requested it.

This method of dynamic correction of data inconsistencies during read requests allows a shadow set member to fail at any point during the merge operation without impacting data availability.

Volume Shadowing for OpenVMS supports both assisted and unassisted merge operations in the same cluster. Whenever you create a shadow set, add members to an existing shadow set, or boot a system, the shadowing software reevaluates each device in the changed configuration to determine whether it is capable of supporting the merge assist.

6.3.1 Unassisted Merge Operations

For systems running software earlier than OpenVMS Version 5.5--2, the merge operation is performed by the system and is known as an unassisted merge operation.

To ensure minimal impact on user I/O requests, volume shadowing implements a mechanism that causes the merge operation to give priority to user and application I/O requests.

The shadow server process performs merge operations as a background process, ensuring that when failures occur, they minimally impact user I/O. A side effect of this is that unassisted merge operations can often take an extended period of time to complete, depending on user I/O rates. Also, if another node fails before a merge completes, the current merge is abandoned and a new one is initiated from the beginning.

Note that data availability and integrity are fully preserved during merge operations regardless of their duration. All shadow set members contain equally valid data.

6.3.2 Assisted Merge Operations

Starting with OpenVMS Version 5.5--2, the merge operation includes enhancements for shadow set members that are configured on controllers that implement assisted merge capabilities. The assisted merge operation is also referred to as a minimerge. The minimerge feature significantly reduces the amount of time needed to perform merge operations. Usually, the minimerge completes in a few minutes.

By using information about write operations that were logged in controller memory, the minimerge is able to merge only those areas of the shadow set where write activity was known to have been in progress. This avoids the need for the total read and compare scans required by unassisted merge operations, thus reducing consumption of system I/O resources.

Controller-based write logs contain information about exactly which LBNs in the shadow set had write I/O requests outstanding (from a failed node). The node that performs the assisted merge operation uses the write logs to merge those LBNs that may be inconsistent across the shadow set. No controller-based write logs are maintained for a one member shadow set. No controller-based write logs are maintained if only one OpenVMS system has the shadow set mounted.

Note

The shadowing software does not automatically enable a minimerge on a system disk because of the requirement to consolidate crash dump files on a nonsystem disk. Dump off system disk (DOSD) is supported on both OpenVMS VAX and OpenVMS Alpha, starting with OpenVMS VAX Version 6.2 and OpenVMS Alpha Version 7.1. If DOSD is enabled, the system disk can be minimerged.

The minimerge operation is enabled on nodes running OpenVMS Version 5.5--2 or later. Volume shadowing automatically enables the minimerge if the controllers involved in accessing the physical members of the shadow set support it. See the Volume Shadowing for OpenVMS Software Product Description (SPD 27.29.xx) for a list of supported controllers. Note that minimerge operations are possible even when shadow set members are connected to different controllers. This is because write log entries are maintained on a per controller basis for each shadow set member.

Volume Shadowing for OpenVMS automatically disables minimerges if:

The shadow set is mounted on a cluster node that is running an OpenVMS release earlier than Version 5.5--2.
A shadow set member is mounted on a controller running a version of firmware that does not support minimerge.
A shadow set member is mounted on a controller that has performance assists disabled.
If any node in the cluster, with a shadow set mounted, is running a version of Volume Shadowing that has minimerge disabled.
The shadow set is mounted on a standalone system. (Minimerge operations are not enabled on standalone systems.)
The shadow set is mounted on only one node in the OpenVMS Cluster.

The following transient conditions can also cause a minimerge operation to be disabled:

If an unassisted merge operation is already in progress when a node fails.
In this situation, the shadowing software cannot interrupt the unassisted merge operation with a minimerge.
When not enough write log entries are available in the controllers.
The number of write log entries available is determined by controller capacity. The shadowing software dynamically determines when there are enough entries to maintain write I/O information successfully. If the number of available write log entries is too low, shadowing temporarily disables logging for that shadow set, and it returns existing available entries on this and every node in the cluster. After some time has passed, shadowing will attempt to reenable write logging on this shadow set.
A controller retains a write log entry for each write I/O request until that entry is deleted by shadowing, or the controller is restarted.
A multiple-unit controller shares its write log entries among multiple disks. This pool of write log entries is managed by the shadowing software. If a controller runs out of write log entries, shadowing disables minimerges and will perform an unassisted merge operation, should a node leave the cluster without first dismounting the shadow set. Note that write log exhaustion does not typically occur with disks on which the write logs are not shared.
When the controller write logs become inaccessible for one of the following reasons, a minimerge operation is not possible.
- Controller failure causes write logs to be lost or deleted.
- A device that is dual ported to multiple controllers fails over to its secondary controller. (If the secondary controller is capable of maintaining write logs, the minimerge operations are reestablished quickly.)

6.4 Controlling HSC Assisted Copy and Minimerge Operations

This section describes how to control assisted copy and minimerge operations on an HSC controller. It is not possible to control these operations on an HSJ controller.

To disable both the merge and copy performance assists on the HSC controller, follow these steps on each HSC controller for which you want to disable the assists:

Press Ctrl/C to get to the HSC prompt.

When the HSC> prompt appears on the terminal screen, enter the following commands:

HSC> RUN SETSHO
SETSHO> SET SERVER DISK/NOHOST_BASED_SHADOWING
SETSHO-I Your settings require an IMMEDIATE reboot on exit.
SETSHO> EXIT
SETSHO-Q Rebooting HSC.  Press RETURN to continue, CTRL/Y to abort:

After you issue these commands, the HSC controller automatically reboots:

INIPIO-I Booting...

To reenable the assists, follow the same procedure on your HSC controller, but use the /HOST_BASED_SHADOWING qualifier on the SET SERVER DISK command.

Use the HSC command SHOW ALL to see whether the assists are enabled or disabled. The following example shows a portion of the SHOW ALL display that indicates the shadowing assists status:

HSC> SHOW ALL
   .
   .
   .
      5-Jun-1997 16:42:51.40 Boot:  21-Feb-1997 13:07:19.47   Up: 2490:26
Version: V860           System ID: %X000011708247     Name: HSJNOT
Front Panel:  Secure                                  HSC Type: HSC90
 .
 .
 .
Disk Server Options:
        Disk Caching: Disabled
        Host Based Shadowing Assists: Enabled
        Variant Protocol: Enabled
        Disk Drive Controller Timeout: 2 seconds
        Maximum Sectors per Track: 74 sectors
        Disk Copy Data connection limit: 4      Active: 0
   .
   .
   .

6.5 What Happens to a Shadow Set When a System Fails?

When a system, controller, or disk failure occurs, the shadowing software maintains data availability by performing the appropriate copy, merge, or minimerge operation. The following subsections describe the courses of action taken when failures occur. The course of action taken depends on the event and whether the shadow set is in a steady state or a transient state.

Transitions from Steady State

When a shadow set is in a steady state, the following transitions can occur:

If you mount a new disk into a steady state shadow set, the shadowing software performs a copy operation to make the new disk a full shadow set source member.
If a failure occurs on a standalone system (the system crashes), on a steady state shadow set, the shadow set SCB reflects that the shadow set has been incorrectly dismounted. When the system is rebooted and the set is remounted, a copy operation is not necessary, but a merge operation is necessary and initiated.
If a failure occurs in a cluster, the shadow set is merged by a remaining node that has the shadow set mounted:
- If performance assists are enabled, and the controller-based write logs are available, the shadowing software performs a minimerge.
- If performance assists are not enabled, the shadowing software performs a merge operation.

Once the transition completes, the disks contain identical information and the shadow set returns to a steady state.

Transitions During Copy Operations

The following list describes the transitions that can occur to a shadow set that is undergoing a copy operation:

If you mount an additional disk into the shadow set that is already undergoing a copy operation, the shadowing software finishes the original copy operation before it begins another copy operation on the newly mounted disk.
When a shadow set on a standalone system is undergoing a copy operation and the system fails, the copy operation aborts and the shadow set is left with the original members. For a standalone system, there is no recourse except to reboot the system and reinitiate the shadow set copy operation with a MOUNT command.
If a shadow set is mounted on more than one node in the cluster and is undergoing a copy operation, should the node performing the copy operation dismount the virtual unit, another node in the cluster that has that shadow set mounted will continue the copy operation automatically.
If a shadow set is mounted on more than one node in the cluster and is undergoing a copy operation, should the node performing the copy operation fail, another node in the cluster that has that shadow set mounted will continue the copy operation automatically.

When a node failure occurs during a shadow set copy operation, merge behavior depends on whether or not the shadowing performance assists are enabled.

If minimerge is enabled and can be performed, the shadowing software interrupts the copy operation to perform a minimerge and then resumes the copy operation.
If the minimerge is not enabled, the shadowing software marks the set as needing a merge operation and finishes the copy operation before beginning the merge operation.

Transitions During Minimerge Operations

When a shadow set is undergoing a minimerge operation, the following transitions can occur:

If a new member is mounted into a shadow set when a minimerge operation is in progress, the minimerge is completed before the copy operation is started.
If another system failure occurs before a pending minimerge has completed, the action taken depends on whether or not the shadowing performance assists are enabled and if the controller-based write logs are available.
- If performance assists are enabled and if the controller-based write logs are available for the last node failure, the shadowing software restarts the minimerge from the beginning and adds new LBNs to the write log file based on the entries taken from the nodes that failed.
- If performance assists are disabled, the shadowing software reverts to a merge operation. The performance assists might be disabled if the controller runs out of write logs or if a failover occurs from a controller with write logs to one that does not.

Transitions During Merge Operations

The following list describes the transitions that can occur to the shadow set that is undergoing a merge operation when performance assists are not available:

If you add a new disk to a shadow set that is undergoing a merge operation, the shadowing software interrupts the merge operation to perform a copy operation. The merge operation resumes when the copy operation is completed.
If a node failure occurs when the shadow set is performing a merge operation, the shadowing software abandons the current merge operation and starts a new merge operation.

Contents

Index