hp Reliable Transaction Router
System Manager's Manual


Previous Contents Index

5.7 Application Considerations

Although applications need not be directly concerned about shadowing matters, certain points must be considered when implementing performance boosting optimizations:

For more information on designing applications, see the Tolerating Site Disaster section in the Reliable Transaction Router Application Design Guide.

5.8 Server States

The current state of a server can be examined using the SHOW SERVER/FULL command. For example,


RTR> show server/full 
Servers: 
 
Process-id:                    13340     Facility:       RTR$DEFAULT_FACILITY 
Channel:                      131073     Flags:                           SRV 
State:                        active(1)     Low Bound: 
High Bound:                   87  13     rcpnam:        "RTR$DEFAULT_CHANNEL" 
User Events:                       0     RTR Events:                        0 
Partition-Id:               16777216 
 
 
Process-id:                    13340     Facility:       RTR$DEFAULT_FACILITY 
Channel:                      196610     Flags:                           SRV 
State:                        active     Low Bound:                    88  13 
High Bound:                      0f'     rcpnam:                      "CHAN2" 
User Events:                       0     RTR Events:                        0 
Partition-Id:               16777217 
 

  1. Server state

Figure 5-1 shows the backend server states that can occur and that appear in the State: field.

Figure 5-1 Backend Server States


5.9 Client States

The current state of a client process can be examined using the SHOW CLIENT/FULL command. For example,


RTR> show client/full 
Clients: 
 
Process-id:                    13340     Facility:       RTR$DEFAULT_FACILITY 
Channel:                      458755     Flags:                           CLI 
State:                      declared(1)     rcpnam:                      "CHAN3" 
User Events:                     255     RTR Events:                        0 

  1. Client state

Figure 5-2 shows the client states that can occur and that appear in the State: field.

Figure 5-2 Frontend Client States


5.10 Partition States

The current state of a partition can be examined using the SHOW PARTITION/FULL command on the routers and the backends. Using the /ROUTER qualifier shows the states as seen from the routers, and using the /BACKEND qualifier shows the states as seen from the backends.

Router partitions:


RTR> show partition/router/full 
 
Facility:       RTR$DEFAULT_FACILITY     State:                        ACTIVE(1)
Low Bound:                         0     High Bound:               4294967295 
Failover policy:                                              fail_to_standby 
Backends:                                                              node10 
States:                                                                active(2)
Primary Main:                  node10    Shadow Main: 
 
 

  1. Router state
  2. Backend state

Backend partitions:


RTR> show partition/backend/full 
 
Partition name:                                RTR$DEFAULT_PARTITION_16777217 
Facility:       RTR$DEFAULT_FACILITY     State:                        active(1)
Low Bound:                    "aaaa"     High Bound:                   "mmmm"(2)
Active Servers:                    0     Free Servers:                      1(3)
Transaction presentation:     active     Last Rcvy BE: 
Txns Active:                       0     Txns Rcvrd:                        0 
Failover policy:     fail_to_standby     Key range ID:               16777217(4)
 
 
Partition name:                                RTR$DEFAULT_PARTITION_16777218 
Facility:       RTR$DEFAULT_FACILITY     State:                        active 
Low Bound:                    "nnnn"     High Bound:                   "zzzz" 
Active Servers:                    0     Free Servers:                      1 
Transaction presentation:     active     Last Rcvy BE: 
Txns Active:                       0     Txns Rcvrd:                        0 
Failover policy:     fail_to_standby     Key range ID:               16777218 

  1. Backend server state
  2. Key range for partition
  3. Server application channels that are available
  4. Key range or partition identification

Figure 5-3 shows the partition states that can occur and that appear in the State: field.

Figure 5-3 Router Partition States



Chapter 6
Troubleshooting RTR Applications

This chapter contains information useful for analyzing performance aspects of RTR, especially in large configurations.

To manage remote nodes, you must have either proxy accounts or rsh access to them. Use RTR remote commands to manage remote nodes.

You should also add and grant operator privileges to the accounts used to manage the RTR network.

6.1 RTR Monitor Pictures

RTR supplies many monitor pictures to help you troubleshoot your application. To display a monitor picture, use the following command at the RTR prompt:


RTR> MONITOR picture-name

The following table provides suggested monitor pictures to display when you encounter problems:
For this type of failure: Use these monitor pictures:
Most common problems SYSTEM
Connection failures ACCFAIL, CONNECTS, FRONTEND, LINK, NETSTAT, STALLS
Transaction sequence problems CALLS
Channel problems CALLS, CHANNEL, PARTIT
Quorum problems QUORUM, ROLEQUOR
V2 interface API V2CALLS
Journal problems JCALLS, JOURNAL
API problems APP2ACP, ACP2APP, REJECTS, REJHIST, ROUTERS
XA interface problems XA
Application Problems APP2ACP, ACP2APP, CALLS, CHANNEL, PARTIT, REJECTS, REJHIST, ROUTERS

See Chapter 7 for descriptions and examples of the monitor pictures, and Chapter 8 for the full syntax of the MONITOR command.

6.2 Enabling RTR Logging

Many problems can be better analyzed when RTR logging has been enabled.

RTR logging output can be directed to a file, for example, on RTR startup.


    $ RTR SET LOG /FILE=logfile.dat 

You should monitor the size of the log file; archive and purge as necessary.

6.3 Starting a Facility

When a facility is started or restarted and servers are declared, RTR recovery features require that it searches journal files of backend nodes in the facility. This allows recovery of any incomplete transactions that were in-flight when the facility last existed. However, if some of the facility's recovery information exists on a backend that is not available at startup, RTR waits for access to the journal on that backend and thus appears to "hang".

This situation can be detected by using MONITOR RECOVERY; backend nodes will be waiting for access to recovery journals. If this is the case, you may follow one of these procedures to continue the startup:

6.4 Analyzing RTR Application Performance

This section provides guidance for System Managers who are analyzing an RTR application that is not functioning correctly.

If an application using RTR hangs, use the following checklist to analyze the situation.

  1. Is there a system-level problem on the node concerned, such as a full disk?
  2. Has RTR been started? Is RTR running correctly?


        $ RTR SHOW RTR 
     
        RTR running on node MYNODE in SYSTEM mode 
     
    

  3. Are the application programs running? RTR lists the processes using RTR with the following command:


        $ RTR SHOW PROCESS 
    

    The user application processes should be in this list.

  4. Has the application stopped?
    Use MONITOR SYSTEM to check for problems. If it indicates a problem with a subsystem, you can get additional information by monitoring that subsystem.
    Network partitioning can also be a problem; this can happen if half or fewer of the configured backends and routers are reachable. To recognize network partitioning, use the MONITOR QUORUM picture. If the number of retries keep increasing without a corresponding increase in the reason counters (CNF/RCH/QRT), you have a partitioned network.
    To check the individual links, use the MONITOR CONNECTS picture. This picture displays the link protocol for connected links, and the reason for a failed connection on any links.
  5. Are the application programs running correctly? Use MONITOR CALLS to examine the state of the participating application processes.
  6. Is a server waiting for an rtr_mt_accepted or rtr_mt_rejected message (in other words, has it voted, but not yet received confirmation of the outcome of the transaction)? This is most likely a problem with the application logic. Also check the database for a possible deadlock situation.
  7. Is a client channel declaration not completing? Client channels need to have connectivity via a router node to at least one server channel before they get an mt_opened message. If the server is up and running, use MONITOR QUORUM and MONITOR CONNECTS to check connectivity.
  8. Has a client channel called rtr_receive_message waiting for an rtr_mt_accepted or rtr_reply_to_client message and not received it within a reasonable time period? Check the application logic and the database for a deadlock.
  9. Has a client channel called rtr_receive_message expecting an rtr_mt_accepted or rtr_mt_rejected message that is not forthcoming? If yes, RTR is awaiting the necessary resources for message transmission to the backend servers. Reasons could be:
  10. Use MONITOR TPS to check the transaction processing rate of each process on a system. A system's capacity is generally expressed as the throughput of the servers. If the rates are low or sporadic, contention may be the cause. For systems with throughput less than 10 tps, the MONITOR TPSLO display provides greater granularity in the associated bar graph.
    Adding server instances can often decrease applicaton throughput if transactions all access common data elements. Partitioning data so that server instances do not interfere with each other is one way to resolve database contention.
  11. Use the command SHOW PARTITION/FULL to display the backlog of transactions on a server pool (partition). If the number of free servers is continually zero, the arrival rate of transactions is greater than the processing capacity of the existing server pool.
    The MONITOR QUEUES picture also shows monitor backlogs. This display shows queuing by partition. If the service time and arrival rate of transactions are large, there are not enough servers to process the load. The remedy is to start additional server instances or decrease the processing time of each transaction. Also, many transactions or messages queued can be caused by contention which is limiting the efficiency of servers.
  12. Check the state of links with:


        $ RTR SHOW FACILITY /LINK 
    

  13. Check if there are sufficient concurrent application server channels to handle the transaction load; messages may have to be queued for long periods before being processed.
    Use MONITOR QUEUES to check the number of outstanding messages for each partition.
  14. Check for congestion by examining the network links with the longest delays by using MONITOR TRAFFIC.
    Use the command MONITOR STALLS to determine if the network needs tuning.
    If there is no congestion, use MONITOR FLOW to discover if a link has credits for data traffic, or if the application requires more bandwidth than is available.
  15. If the RTRACP dies when adding a facility (which has a backend role on the node), suspect journal file difficulties. Ensure that the journal file is not corrupted, or incompatible with the running RTR version. In the event of journal file corruption, please contact your HP support office.

6.5 Server Crashes

Analyze the reasons why the server crashed before you restart the server. Failures that cascade could present a problem, but note that doing a restart will prevent failover.

6.6 Link Connect Failures

The following table explains the meaning of link connect failure codes:
Code Text Implications
NOTRECOGNISED Node not recognized Remote node that received the connection request does not have the local node in its RTR configuration.
REFUSED Connection refused Indicates one of the following conditions on the remote system: either RTR is not running, or a requested network protocol is not installed.
FACNOTDEC Facility not declared The requested facility is not configured on the remote node.
NODENOTCFG Node not configured The remote node has the local node in its configuration, though not as part of the requested facility.
ROLESMISMATCH Roles mismatch The remote node has the local node configured in the requested facility, but in a role other than the one requested.

Any of the above errors can occur as the result of the connection request arriving at the wrong node for any of the following reasons:

6.7 Rejected Transactions

The following table explains the meaning of rejected transaction codes:
Code Text Implications
NODSTFND No destination found Primary and all alternate servers for a partition cannot be reached by the client application. This situation can be caused by network problems or services which have not been started or have crashed.
JNLFULL Journal full May occur when the RTR journal is full. Note that RTR reserves a percentage of the journal to ensure that in-progress transactions can be completed. The JNLFULL error is most likely to be seen with shadow servers running in remember mode, but can also be caused by many transactions being queued to an unresponsive server.
DLKTXRES Deadlock detected transaction rescheduled May occur during the commit cycle for multi-participant transactions or in extreme failover situations when the order of transactions must be corrected. This reject reason indicates that two transactions were interfering with each other. RTR rejects one branch of the offending transactions to clear the deadlock. Since this transaction branch is subsequently rescheduled by RTR, this reject can be considered informational.
TIMEOUT Time out Occurs if the rtr_send_to_server timeout provided by the client application expires. This reject indicates poor responsiveness by the service.

6.8 Using the Snapshot Procedure

Certain difficulties can be more easily investigated if a snapshot of the problem node is made. Make a snapshot if the application hangs, causes delays, or seems to be causing other problems.

OpenVMS

On OpenVMS systems, a snapshot is made by executing a command file:


$  @SYS$MANAGER:RTR$SNAPSHOT.COM 

The output is a file named nodename _RTR_DIAGS.TMP.

Information in this file can help to determine the possible causes of a fault (OpenVMS, DECnet, RTR, environment, database, application, and so on.) The information includes numerous RTR monitor pictures, executable image versions, process states, and so on.

UNIX

On UNIX systems, make a snapshot by entering the following command on the problem node:


# rtr_snapshot.sh 

The information displayed on the screen includes many RTR monitor pictures, executable image versions, process states, and other information.

Windows

To take a snapshot on Windows, click on the Snapshot icon on the RTR menu. A DOS-style window with the title "Snapshot" appears as the snapshot is taken. The file rtr_snapshot.log is created in the directory where RTR runs, for example,
C:\Program Files\HP\RTR . You can read this file with an editor such as Notepad.

Note

If using Microsoft Windows Scripting Host, the minimum version is 5.6 for use with RTR. With an earlier version of the Scripting Host, RTR snapshot will run with reduced functionality.

To obtain the latest Scripting Host software, use the Microsoft download center at
http://msdn.microsoft.com/downloads .

Sun

To take a snapshot on Sun, use fssnap . This copies the original filesystem blocks into a file as they are changed, with some performance degradation.

6.9 Generating a Process Dump

OpenVMS Systems

Certain potential difficulties can be more easily investigated by RTR Support if a dump of the RTR ACP is available. It shows diagnostics generated if unhandled exceptions occur.

The file SYS$MANAGER:RTR$STARTUP.COM can be altered to include the definition of the logical name RTR$DUMP_DIRECTORY which specifies the device and directory where the dump is to be generated.

Since an RTR dump file typically uses about 5000 blocks, enough space should be available on the chosen disk. For a very large node installation, or a large number of links, the dump file may be up to 20,000 blocks.

To prepare for dump creation, make sure that:

An RTR ACP dump can be created as follows:


    $ RTR 
    RTR> SET MODE /UNSUPPORTED 
    RTR> DEBUG ACP 
    ^G 
    RTR> SET MODE /NOUNSUPPORTED 
    ^Z 
    $ 

Unsupported commands should be used with care.

UNIX Systems

UNIX core files are generated with no special configuration, but their name and location may vary depending on operating system settings and how RTR is started up. The file rtr_error*.log is usually created in /rtr .

Windows Systems

On Windows systems, a process dump file can be generated by enabling the Dr. Watson post-mortem crash analyzer. This is done by entering the MS-DOS command:


(%WINDIR%\drwtsn32 -i) 

The files created are %WINDIR%\DRWTSN32.LOG and %WINDIR%\USER.DMP.

These files should be included with any problem report submitted to RTR Engineering in the event of an RTR crash, along with the RTR dump file (RTR_<n>.DMP) and the RTR log file. The file rtr_error*.log is also created. Send in *.log files when reporting an error, if running RTR with logging in use.


Previous Next Contents Index