Previous | Contents | Index |
Transaction presentation is the process of passing transactions to idle server channels for processing. While transaction presentation is active, new transactions are started on the first free server channel for the appropriate partition.
Use the /SUSPEND qualifier to the SET PARTITION command to halt the presentation of new transactions to servers on the backend where the command is entered. The command completes when the processing of all currently active transactions is complete. The optional /TIMEOUT qualifier specifies, as a number of seconds, the time that the command waits for completion. If the command times out, presentation of new transactions are suspended, but there still exist transactions for which servers have yet to complete processing. The operator must decide either to reenter the command and wait a further period of time, or resume the partition. Note that use of this command does not affect any transaction timeout value specified by RTR clients, so such transactions may encounter a timeout condition if the partition remains suspended.
The /RESUME qualifier restarts presentation of transactions to the server application channels.
The following examples show how to use the qualifiers:
RTR> SET PARTITION/SUSPEND/TIMEOUT=5 Facility1:Partition1 RTR> RTR> SET PARTITION/RESUME Facility1:Partition1 |
For a more complete description, see the SET PARTITION command in Chapter 6.
To suspend transaction presentation on a partition with a timeout of 30 seconds, program the set_qualifier argument of the rtr_set_info() call as follows:
rtr_qualifier_value_t set_qualifiers[ 3 ]; rtr_partition_state_t newState = rtr_partition_state_suspend; rtr_uns_32_t ulTimeoutSecs = 30; set_qualifiers[ 0 ].qv_qualifier = rtr_partition_state; set_qualifiers[ 0 ].qv_value = &newState; set_qualifiers[ 1 ].qv_qualifier = rtr_partition_cmd_timeout_secs; set_qualifiers[ 1 ].qv_value = &ulTimeoutSecs; set_qualifiers[ 2 ].qv_qualifier = rtr_qualifiers_end; set_qualifiers[ 2 ].qv_value = NULL; |
Note that the timeout is an optional element. To resume transaction
presentation, specify
newState
as
rtr_partition_state_resume
.
3.6.3 Controlling Recovery
The purpose of RTR automated recovery is to ensure the best possible consistency of application databases across a distributed computing environment. To achieve this, RTR relies in part on information stored in the journals of the participating backends. Should one or more of these systems be unavailable at recovery time, automated recovery may stall or fail awaiting availability of these systems and their journals. This enforces data consistency where transaction order is important, but can affect application availability.
For example, if a partition enters a wait state or fails, but has neither a local or remote journal, an operator can instruct RTR to skip the current step in the recovery process with the /IGNORE_RECOVERY qualifier. Since this command bypasses parts of the recovery cycle, use it with caution in cases where availability is valued over consistency in application databases. For a read-only database, transaction order will be unimportant.
The recovery cycle can also be manually restarted with the /RESTART_RECOVERY qualifier. This may be useful if the operator previously aborted automated recovery. Since this command can result in recovery of transactions from previously inaccessible journals, do not use this if your applications are sensitive to the order in which transactions are processed by the servers.
The following example shows how to use the qualifiers:
RTR> SET PARTITION/IGNORE_RECOVERY Facility1:Partition1 RTR> RTR> SET PARTITION/RESTART_RECOVERY Facility1:Partition1 |
A complete description of the SET PARTITION command qualifiers can be found in Chapter 6.
To terminate the current recovery state, program the set_qualifier argument of rtr_set_info() as follows:
rtr_qualifier_value_t set_qualifiers[ 2 ]; rtr_partition_state_t newState = rtr_partition_state_exitwait; set_qualifiers[ 0 ].qv_qualifier = rtr_partition_state; set_qualifiers[ 0 ].qv_value = &newState; set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; set_qualifiers[ 1 ].qv_value = NULL; |
To restart recovery, specify
newState
as
rtr_partition_state_recover
.
3.6.4 Controlling the Active Site
RTR lets the system operator deploy a range of shadow and standby partitions in order to provide the desired degree of application resilience to failures. By default, RTR automatically manages the assignment of active and standby roles to the available partition instances. The operator can assign a relative priority to each backend on which a partition instance exists. Enter priority as a list of backend node names with the highest priority first in decreasing order, as shown in the following example:
RTR> SET PARTITION/PRIORITY_LIST=(BE1, BE2, BE3) Facility1:Partition1 |
Suspend transaction presentation before entering or changing the priority list.
Chapter 8 provides more information on the SET PARTITION command.
To set the partition backend priority list, program the set_qualifier argument of the rtr_set_info() call as follows:
rtr_qualifier_value_t set_qualifiers[ 2 ]; char *szNodeList = "your,list,of,node,names,here" set_qualifiers[ 0 ].qv_qualifier = rtr_partition_be_priority_list; set_qualifiers[ 0 ].qv_value = &szNodeList; set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; set_qualifiers[ 1 ].qv_value = NULL; |
In a system employing shadows or standbys, there is a choice to be made in case the primary site fails. The /FAILOVER_POLICY qualifier to the SET PARTITION command lets the system operator select one of the following policies that RTR should pursue in selecting the new primary site in the event of a failure:
The time required to effect a failover and the subsequent impact on client response times will influence choice of failover policy. The time for standby takeover of a failed node's journal depends on the size of that journal, though failover to a shadow site is affected quickly. However, if the secondary shadow site has accumulated a backlog of transactions, they must be processed before any new transactions can be started. The choice will be determined by the characteristics of your application and configuration.
The following example shows use of the /FAILOVER_POLICY qualifier setting failover to a shadow server:
RTR> SET PARTITION/FAILOVER_POLICY=SHADOW Facility1:Partition1 |
The following example shows setting failover policy to a standby server:
RTR> SET PARTITION FacALPHA:AtoG/FAILOVER_POLICY=STAND_BY |
You can view the policy that has been set with the SHOW PARTITION/FULL command.
For more information see the SET PARTITION command in Chapter 8.
To set the partition failover policy, program the set_qualifier argument of the rtr_set_info() call as follows:
rtr_qualifier_value_t set_qualifiers[ 2 ]; rtr_partition_failover_policy_t newPolicy; set_qualifiers[ 0 ].qv_qualifier = rtr_partition_failover_policy; set_qualifiers[ 0 ].qv_value = &newPolicy; set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; set_qualifiers[ 1 ].qv_value = NULL; |
Legal values for newPolicy are:
After a failure, RTR replays transactions stored in its journal as appropriate. RTR has implemented the capability of controlling transaction replay in cases where a killer message happens during a transaction replay preventing recovery from continuing normally. A killer message causes server availability to be lost because of the presence of a message capable of causing repeated server application failure during recovery. This is typically the result of an improperly handled condition or application programming error within the server itself. Under such circumstances it may be desirable to sidestep a particular transaction, maintain server operation, and manually process the transaction at some later time.
The RTR solution is to establish, for a given partition with the SET PARTITION command, the maximum number of retries for any given transaction presented during recovery. Once this limit has been exceeded, the offending transaction is removed from the recovery process and is written to the journal as an exception record. Subsequent processing of this transaction requires manual intervention by someone qualified to evaluate and correct the situation in both the application and in RTR. Once the application status is understood, the SET TRANSACTION command can be used to update the journal, thus ensuring that the final state of any manually transacted exceptions are accurately reflected in future recovery operations.
The recovery retry count is partition-specific, and applies to both local and shadow recovery operations. The default is no limit on the number of retries, which permits a killer message to bring down all available servers servicing a given partition.
The recovery retry count should be set before starting (or restarting) the application servers so that the limit is established prior to the start of recovery operations.
The following example shows how to set the retry count:
RTR> SET PARTITION/RECOVERY_RETRY_COUNT=3 Facility1:Partition1 |
See Chapter 8 for more information on the SET PARTITION command.
To set the partition transaction recovery limit, program the set_qualifier argument of rtr_set_info() as follows:
rtr_qualifier_value_t set_qualifiers[ 2 ]; rtr_uns_32_t newLimit = . . .; set_qualifiers[ 0 ].qv_qualifier = rtr_partition_rcvy_retry_count; set_qualifiers[ 0 ].qv_value = &newLimit; set_qualifiers[ 1 ].qv_qualifier = rtr_qualifiers_end; set_qualifiers[ 1 ].qv_value = NULL; |
The retry limit applies to all transactions that have reached the
voting stage on a server. If a server always dies before voting on a
transaction, RTR aborts the transaction after the third try
("three-strikes and you're out!"). For more information, see
the description of /RECOVERY_RETRY_COUNT for the SET PARTITION command.
3.6.7 Partition Persistence
Partitions in RTR are designed to be persistent, remaining until explicitly removed during normal RTR processing. However, under certain conditions, relics of partitions can remain in the RTR journal. RTR automatically performs some cleanup of such records, but depends on the creation of the relevant facility to initiate this process. For example, in a test environment, many facilities are created for temporary use, with no intention of retaining those facilities. Because the creation of each facility may cause the creation of associated records for a partition in the RTR journal, creating many ad hoc facilities can cause the RTR journal to become filled. In such a case, when trying to create a new partition (or opening a new server channel), the error message NOMOREPRT may appear.
To correct this problem, the journal must be purged of these ad hoc
entries. To purge the RTR journal of such unwanted transactions, use
the DUMP JOURNAL command to verify the partition name and transaction
ID of the unwanted transactions, and use the SET TRANSACTION command
with the partition name and transaction ID to set the state to DONE.
Recreate the facility with the CREATE FACILITY command.
3.7 Displaying Partition Information
Information on the definition and state of a partition is displayed with the SHOW PARTITION command, as seen in the following example. The information of interest in the context of partition management relates to the backend instance of the partition. See Chapter 8 for more information on the SHOW PARTITION command.
RTR> show partition/backend Backend partitions on node BE1 at Wed Feb 24 15:07:50 1999 Partition name Facility State RTR$DEFAULT_PARTITION_16777217 RTR$DEFAULT_FACILITY active RTR$DEFAULT_PARTITION_16777218 RTR$DEFAULT_FACILITY active |
This section describes the concepts of RTR's transaction management capability.
The RTR transaction is the center of an RTR application, and transaction state is the property that characterizes a transaction's current condition. Whenever a transaction progresses from one stage to another, the transaction state is updated to reflect a transaction transition. Transaction states are maintained in memory. Transaction states are also stored in the RTR Journal for recovery purposes.
Three different states are used internally by RTR to keep track of transaction status.
These three states are very closely related. The Transaction Runtime State, also known as Transaction State, describes how a transaction progresses from an RTR role (FE, TR, BE) point of view. For example, a transaction can enter a stage in which its transaction state from an RTR frontend viewpoint is different from its transaction state from the viewpoint of an RTR router.
The Transaction Journal State describes how a transaction branch running on an RTR backend progresses from the RTR journal perspective. The Transaction Journal State and the Transaction Server State belong to each separate branch (participating partition) of the transaction. When a transaction branch changes state, its corresponding Transaction Journal State is updated and the new state, along with other information pertaining to this transaction, is stored in the RTR journal. The Transaction Journal State is primarily used by RTR to perform the recovery replay of a transaction after a failure, if necessary. An RTR frontend and router will not see this state. Note that because the Transaction Runtime State is not always stored immediately in the journal, the state in the journal may not always reflect the actual state of the transaction, but is kept updated by RTR. Table 4-1 describes the Transaction Journal States.
Transaction Journal State (by Branch) | Explanation of State |
---|---|
SENDING | Initial state of the transaction branch as the client sends a call to the server application. The RTR backend has received the transaction and RTR is waiting for votes. |
VOTED | The servers have voted and the vote has been written to disk. |
COMMIT | RTR has asked the servers to commit the transaction. |
ABORT | RTR has asked the servers to roll back the transaction because of a "no" vote. |
DONE | The servers have informed RTR that the transaction has been committed to the database. It is safe to FORGET the transaction. |
PRI_DONE | The primary server has committed the transaction; the secondary may not have done so. This is the typical case of a REMEMBER transaction. |
EXCEPTION | RTR asked the server to commit the transaction, but the server failed to commit it to the database. The transaction needs manual reconciliation. |
The Transaction Server State describes transaction state as seen by a specific server, serving that branch of the transaction. RTR uses this state to determine if a server is available to process a new transaction or if a server has voted on a particular transaction. As with the Transaction Journal State, the Transaction Server State is only relevant at the backend.
RTR provides a set of comprehensive management utilities to help users closely monitor the flow of a transaction and all three types of states associated with that transaction. These utilities help users understand how a transaction migrates from one stage to another and help diagnose problems.
Use the SHOW TRANSACTION command to examine a transaction's up-to-date status on frontend, router or backend roles. With this command, users can see all three types of transaction states of a particular transaction and also understand how the RTR journal and server applications perceive this transaction. When a transaction commits or aborts, all status associated with this transaction is removed from memory and can no longer be monitored by the command.
The DUMP JOURNAL command can be used to trace and review the flow of a transaction. The RTR journal saves all of the information about a transaction. This includes its transaction journal state and the transaction messages (records) received from the RTR client and sent to the server. The information is kept until a transaction is committed or aborted and all participants have been notified.
Use the SET TRANSACTION command to modify the current state of a transaction to a new state. This command can be used to circumvent an unexpected situation. For example, in a situation with shadow servers, the system administrator might decide not to replay (recover) all remembered transactions in an RTR journal after a failure. The SET TRANSACTION command could set specified transactions in a PRI_DONE or REMEMBER state to a DONE state and avoid the delay of transactions being remembered from a journal for fast recovery. The SET TRANSACTION command should only be used by experienced RTR system administrators as the command introduces the risk of corrupting or losing transactions if used incorrectly. It can be used on the backend only and the RTR log file must be turned on for this command.
Log file entries are made for all transaction state changes for
debugging and auditing purposes.
4.2 Exception Transactions
When a server votes on a transaction, RTR expects the server to commit the transaction to the database when RTR makes the request. If for some reason the server cannot do so, the server has two choices:
EXCEPTION transactions can be inspected with the DUMP JOURNAL command.
The final state of the transaction should say EXCEPTION.
4.2.1 Dealing with EXCEPTION Transactions
The system administrator must decide what to do with transactions that are marked EXCEPTION. There are two choices:
Previous | Next | Contents | Index |