Reliable_Transaction_Router_________________________ Release Notes June, 1999 Reliable Transaction Router (RTR) is an open client/server middleware for continuous computing. RTR Engineering is pleased to announce that RTR Version 3.2 is now available. Operating System and Version: Windows NT Version 4.0 Windows 95, Windows 98 Compaq Tru64 UNIX (formerly DIGITAL UNIX) Version 4.0D, 4.0E, 4.0F Sun Solaris Version 2.5, 2.5.1, 2.6, 7 IBM AIX Version 4.2, 4.3 Hewlett-Packard HP-UX Version 10.20 OpenVMS Version 6.2, 7.1, 7.2 Software Version: Reliable Transaction Router Version 3.2 Compaq Computer Corporation Houston, Texas ________________________________________________________________ June, 1999 COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL. THIS INFORMATION IS PROVIDED "AS IS" AND COMPAQ COMPUTER CORPORATION DISCLAIMS ANY WARRANTIES, EXPRESS, IMPLIED OR STATUTORY AND EXPRESSLY DISCLAIMS THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR PARTICULAR PURPOSE, GOOD TITLE AND AGAINST INFRINGEMENT. This publication contains information protected by copyright. No part of this publication may be photocopied or reproduced in any form without prior written consent from Compaq Computer Corporation. © Digital Equipment Corporation 1999. All rights reserved.. The software described in this guide is furnished under a license agreement or nondisclosue agreement. The software may be used or copied only in accordance with the terms of the agreement. Compaq and the Compaq logo are registered in the United States Patent and Trademark Office. The following are trademarks of Compaq Computer Corporation: AlphaGeneration, AlphaServer, AlphaStation, Compaq Internet Personal Tunnel, DEC, DECconnect, DECdtm, DECnet, DIGITAL, OpenVMS, PATHWORKS, POLYCENTER, Reliable Transaction Router, TruCluster, Tru64 UNIX, VAX, and VMScluster. The following are third-party trademarks: AIX and IBM are registered trademarks of International Business Machines Corporation. Encina is a registered trademark of Transarc Corporation. Hewlett-Packard and HP-UX are registered trademarks of Hewlett-Packard Company. Intel is a trademark of Intel Corporation. Microsoft, Microsoft Access, Microsoft SQL Server, Internet Explorer, MS-DOS, Visual Basic, Visual C++, Windows, Windows 95, Windows 98, and Windows NT are trademarks or registered trademarks of Microsoft Corporation. Netscape, Netscape Communicator, and Netscape Navigator are registered trademarks of Netscape Communications Corporation. Oracle, ORACLE7, PL/SQL, SQL*Net, AND SQL*Plus are trademarks or registered trademarks of Oracle Corporation. Solaris, SPARCstation, SUN, SunOS, and Sunlink are trademarks or registered trademarks of Sun Microsystems, Inc. UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company, Ltd. This document was prepared using VAX DOCUMENT Version 2.1. _________________________________________________________________ Contents Preface................................................... v 1 General Information Applicable to all Platforms..................................... 1 1.1 New Features.............................. 1 1.2 Corrections............................... 8 1.3 Known Problems with Workarounds........... 22 1.4 Restrictions.............................. 24 1.5 Documentation Changes..................... 28 1.6 Limitations............................... 28 1.7 Known Problems............................ 28 1.8 Problem Reporting......................... 29 2 Compaq Tru64 UNIX Specific Information........ 29 2.1 New Features.............................. 29 2.2 Known Problems Corrected Since Version 3.1D...................................... 29 2.3 Known Problems with Workarounds........... 31 2.4 Restrictions.............................. 32 3 OpenVMS Specific Information.................. 33 3.1 New Features.............................. 33 3.2 Known Problems Corrected Since Version 3.1D...................................... 33 3.3 Known Problems with Workarounds........... 36 3.4 Restrictions.............................. 36 4 AIX Specific Information...................... 37 4.1 New Features.............................. 37 4.2 Known Problems Corrected Since Version 3.1D...................................... 37 4.3 Known Problems with Workarounds........... 39 4.4 Restrictions.............................. 40 5 Sun Solaris Specific Information.............. 40 5.1 New Features.............................. 41 5.2 Known Problems Corrected Since Version 3.1D...................................... 41 iii 5.3 Known Problems with Workarounds........... 42 5.4 Restrictions.............................. 44 6 HP-UX Specific Information.................... 45 6.1 New Features.............................. 45 6.2 Known Problems Corrected Since Version 3.1D...................................... 45 6.3 Known Problems with Workarounds........... 47 6.4 Restrictions.............................. 48 7 Windows NT Specific Information............... 49 7.1 New Features.............................. 49 7.2 Known Problems Corrected Since Version 3.1D...................................... 51 7.3 Known Problems with Workarounds........... 52 7.4 Restrictions.............................. 53 8 Windows 95 and Windows 98 Specific Information................................... 55 8.1 New Features.............................. 55 8.2 Known Problems Corrected Since Version 3.1D...................................... 55 8.3 Known Problems with Workarounds........... 56 8.4 Restrictions.............................. 56 iv _________________________________________________________________ Preface Purpose of these Release Notes This document provides information for Reliable Transaction Router, V3.2. Intended Audience These Release Notes are intended for all users of Reliable Transaction Router. Please read all of this document before using the product. Document Structure o Section 1 provides information applicable to all platforms. o Section 2 provides information for Compaq Tru64 UNIX systems. o Section 3 provides information for OpenVMS systems. o Section 4 provides information for AIX systems. o Section 5 provides information for SUN Solaris systems. o Section 6 provides information for HP-UX systems. o Section 7 provides information for Windows NT systems. o Section 8 provides information for Windows 95 systems. v Related Documentation o Reliable Transaction Router Installation Guide o Reliable Transaction Router Application Programmer's Reference Manual o Reliable Transaction Router System Manager's Manual o Reliable Transaction Router Migration Guide o Reliable Transaction Router Application Design Guide vi Conventions In this manual, every use of Alpha VMS means the OpenVMS Alpha operating system, every use of VAX VMS means the OpenVMS VAX operating system, and every use of VMS means both the OpenVMS Alpha operating system and the OpenVMS VAX operating system. The following conventions are used to identify information specific to OpenVMS Alpha or to OpenVMS VAX: The following conventions are also used in this manual: ___________________________________________________________ Convention________Meaning__________________________________ boldface text Boldface text represents the introduction of a new term or the name of an argument, an attribute, or a reason. Boldface text is also used to show user input in online versions of the manual. italic text Italic text emphasizes important information, indicates variables, and indicates complete titles of manuals. Italic text also represents information that can vary in system messages (for example, Internal error number), command lines (for example, /PRODUCER=name), and command parameters in text. UPPERCASE TEXT Uppercase text indicates a command, the name of a routine, the name of a file, or the abbreviation for a system privilege. user input This bold typeface is used in interactive examples to indicate typed user input. system output This typeface is used in interactive and __________________code_examples_to_indicate_system_output._ vii 1 General Information Applicable to all Platforms This chapter describes the new features, corrections, workarounds, and restrictions in Reliable Transaction Router, Version 3.2. Refer to later chapters for platform- specific information. ________________________ Note ________________________ All items in these Release Notes are prefixed with a Problem Number, in order to improve problem tracking and reporting. ______________________________________________________ 1.1 New Features o 14-5-6 Create system management interface for RTR using BMC Patrol An RTR Knowledge Module has been developed for those environments that use the BMC PATROL Application Manager suite of software products. The RTR Knowledge Module can be installed on systems with RTR and Patrol Agent installed, and can be used to monitor RTR and perform system management operations. For more information on the Knowledge Module functionality, refer to the Insight Innovations web site or contact them directly. Insight Innovations can be reached as follows: Insight Innovations Level 13 121 Walker Street North Sydney, NSW 2060 Australia Telephone: +61 2 9460 3022 FAX: +61 2 9923 1551 Internet: http://www.inin.com.au/ o 14-5-43 Improved diagnostics The heading in the output from most RTR SHOW commands now includes the nodename, group and date. 1 Diagnostic counters have been added to the RTR IPC layers. These can be viewed with the monitor pictures ipc.mon and ipcrate.mon. The former displayed raw counts whilst the latter displays rate information. o 14-5-45 Named partition support Starting with RTR V3.2, backend partitions have a name attribute. Changes to the rtr_open_channel() API allow the caller to specify a partition name and to either supply a name for a partition to be created, or specify the name of an existing partition to which the application wishes to attach a server channel. RTR provides default names for existing applications. Partition names are subjects of partition management commands, also new for RTR V3.2. o 14-5-46 Independent transaction flags added RTR normally assumes that each transaction processed by a given server depends on the transactions that particular server has previously accepted. To keep the shadowed database identical to that on the primary, RTR controls the order in which the secondary executes transactions. The secondary is constrained to execute transactions in the same order as the primary. Under some circumstances, this can lead to the secondary sitting idle, waiting to be given a transaction to execute. This release introduces a performance enhancement which may help some applications reduce idle time on the secondary, decreasing the corresponding backlog. If the application knows that particular transactions are independent of all transactions previously received, then the application can set one of two new flags: o RTR_F_ACC_INDEPENDENT Set on an rtr_accept_tx call to indicate this transaction is independent. o RTR_F_REP_INDEPENDENT Set on an rtr_reply_to_client call along with RTR_F_REP_ACCEPT to indicate this transaction is independent. A transaction accepted with one of these flags can be started on the secondary while other transactions are still running. 2 All transactions flagged with one of these flags must truly be independent of transactions that have previously executed. They will execute in an arbitrary sequence on the secondary. They may not contend with each other, nor with previous transactions for record locks. They may not use data that has been updated by previous transactions, or by each other. o 14-5-47 SET TRANSACTION command A new utility, RTR SET TRANSACTION, is introduced in the RTR V3.2 release. This utility will allow an RTR system administrator to change a transaction's journal state during runtime. It is useful to resolve some abnormal situations. Please refer to RTR System Manager's Manual for details. o 14-5-48 DUMP JOURNAL enhancements The DUMP JOURNAL command provides an easy way to observe the contents of a node's journal file at any point in time. Various qualifiers allow for the selection of transactions that meet specific criteria, and a full or brief accounting of both transaction state and message data can be displayed to the screen or output to a file. o 14-5-49 SET PARTITION/failover_policy command RTR V3.2 contains a new command, SET PARTITION/FAILOVER_ POLICY, that allows the system operator to determine RTR behavior in selecting a site to make active in the event of the loss of the current primary site. Options allow the operator to choose between failover to a standby, or to a remote shadow site. The command may be invoked as a command line, or programmatically through the rtr_set_info() API. Additional information can be found in the appropriate sections of the Programmers Reference and System Manager's Manuals. o 14-5-51 SET PARTITION/SUSPEND/RESUME RTR V3.2 contains new commands that allow the system operator to control the presentation of transactions to server applications. SET PARTITION/SUSPEND stops the presentation of transactions on a given backend 3 partition. SET PARTITION/RESUME resumes transaction presentation to servers. The commands may be invoked as a command line, or programmatically through the rtr_set_info() API. Additional information can be found the appropriate sections of the Programmer's Reference and System Manager's Manuals. o 14-5-52 SET PARTITION/[NO]SHADOW RTR V3.2 contains new commands that allow the system operator to control the state of shadowing for a partition. SET PARTITION/SHADOW enables shadowing for a partition. SET PARTITION/NOSHADOW disables shadowing for a partition. The commands may be invoked as a command line, or programmatically through the rtr_set_info() API. Additional information can be found the appropriate sections of the Programmer's Reference and System Manager's Manuals. o 14-5-54 rtr_set_info() This release of RTR contains a limited implementation of the rtr_set_info() verb. See the RTR Application Programmers Guide for further information. RTR V3.2 allows programmed SET PARTITION and SET TRANSACTION commands through the rtr_set_info() call. Details on the usage of this call can be found in the RTR Application Programmers Reference Manual. o 14-5-58 Create/delete partition V3.2 of RTR contains commands for the creation and deletion of named key range partitions. See the V3.2 System Manager's Manual for further information. o 14-5-62 Create flag RTR_F_CLO_IMMEDIATE to close channel w/o acknowledging in-progress transaction A new flag, RTR_F_CLO_IMMEDIATE, has been added to the rtr_close_channel() call. Setting this flag causes RTR to recover an accepted transaction (if any) on this channel to an alternate server channel. Without this flag set, the rtr_close_channel() call implicitly acknowledges the successful completion of any post rtr_ 4 accept_tx() processing, such as any database commit processing, and the transaction is not recovered. o 14-5-70 Add C++ example to kit A sample C++ application for RTR may now be found in the "examples" directory. o 14-5-79 rtr_get_tid API enhanced to support XA and DecDTM Transaction Management The rtr_get_tid API call has been enhanced to return transaction identifiers associated with XA and DECdtm managed transactions. No change is required for applications which currently use rtr_get_tid to return native RTR transaction identifiers. The changed function prototype is as follows: rtr_status_t rtr_get_tid ( rtr_channel_t channel, rtr_tid_flag_t flags, void *ptid ) The flags and ptid arguments now accept the following options: flag argument ptid data type Returns ------------- -------------- --------------------- RTR_NO_FLAGS rtr_tid_t RTR transaction id RTR_F_TID_RTR rtr_tid_t RTR transaction id RTR_F_TID_XA rtr_xid_t XA transaction id RTR_F_TID_DDTM rtr_ddtmid_t DECdtm transaction id o 14-5-80 Specify ordered lists of backends RTR V3.2 contains a command SET PARTITION/PRIORITY_LIST that allows the system operator to specify to RTR an ordered list of the backends in the system. RTR will take this ordering into account when determining which of the eligible backend partitions should become the active member. The commands may be invoked as a command line, or programmatically through the rtr_set_info() API. Additional information can be found the appropriate sections of the Programmer's Reference and System Manager's Manuals. 5 o 14-5-84 Changes to monitor files The following monitor files have been changed to display prepare-related information: - acp2app.mon - app2acp.mon - calls.mon Do not screen-scrape RTR monitor pictures to get information about RTR. The API call rtr_request_info() should be used instead. o 14-5-89 Recovery retry count A maximum retry count can now be assigned to active partitions, helping to maintain server availability in the presence of a transaction that repeatedly causes termination of one or more servers during recovery operations. When the retry count is exceeded, the errant transaction is written in the journal as an exception record so that processing may continue. This feature is documented in the RTR System Manager's Manual for the SET PARTITION command. o 14-5-97 SHOW TRANSACTION enhancement In RTR V3.2, the SHOW TRANSACTION command is enhanced by adding some new qualifiers. This allows the system operator to filter the display of transactions further than was previously possible through the use of additional qualifiers. The new qualifiers are: /state /user /partition_name /since /before Additional information on these qualifiers can be found in the appropriate sections of the RTR Application Programmer's Reference and RTR System Manager's Manuals. o 14-5-99 The crm_tx_kr_jnl_state_t enum is exposed to the application programmer with rtr_request_info() calls. 6 - In rtr.h the enum rtr_transaction_state_t has been replaced by rtr_tx_jnl_state_t. - In rtr.h the enum identifier verb_set has been replaced by rtr_verb_set. Details can be found in the System Manager's Manual. o 14-5-104 DUMP JOURNAL command enhancement The DUMP JOURNAL command has been enhanced with the following features: - The command DUMP JOURNAL also shows how many prepare records have been processed. - The command DUMP JOURNAL /FULL also shows the prepare records if there are any on this node. - The command DUMP JOURNAL /RECORD_CLASS=PREPARE explicitly selects the prepare records in the journal for displaying. o 14-5-109 rtr_rqif is an unsupported utility distributed with the RTR kit rtr_rqif is an unsupported utility distributed with the RTR kit. It is used as an interface to the rtr_ request_info API by some third-party vendors providing RTR add-ons. o 14-8-155 New environment variables for adjusting connection timeout parameters Two new environment variables have been created to give operators greater discretion in determining how long to wait before retrying a network connection attempt. The RTR_TIMEOUT_CONNECT variable controls how long a connecting node will wait for a response from the connectee to its link initiation request. This value defaults to 60 seconds. If the RTR_TIMEOUT_CONNECT period expires without a response from the connectee, RTR will wait an additional period determined by the RTR_TIMEOUT_CONNECT_RELAX variable. This variable defaults to a value of 90 seconds. The purpose of the "relax" period is to allow the connector to accept a connection request 7 from the connectee node, if any are forthcoming. It is important not to set this value too low on Backends and Routers, as such machines are likely to be receiving connection requests from many other machines. On machines configured to use only the Frontend role, however, you can safely set RTR_TIMEOUT_CONNECT_RELAX to just a few seconds so that the node can be free to attempt to connect to another router as quickly as possible. The minimum value for RTR_TIMEOUT_CONNECT is 5 and the minimum for RTR_TIMEOUT_CONNECT_RELAX is 1. 1.2 Corrections This section addresses known problems corrected since V3.1D. o 14-1-39 Declaring exit handlers in RTR applications If an exit handler contains calls to RTR, then the exit handler must be declared after the first call to RTR. Using the RTR V2 or V3 API, if the exit handler is declared before the first call to RTR, then any call to RTR made within the exit handler will return an error. Under the V3 API, the error status returned is RTR_STS_ INVCHANNEL. Under the V2 API, the error status returned is RTR$_INVALCH. o 14-1-99 Read-only strings and compiler optimization RTR performance has improved generally on some platforms as a result of new compiler versions and compiler option tuning. Memory required for each RTR and application process has been reduced by locating constant data and strings in shared read-only memory. o 14-1-183 FE not detecting a non-responsive router Diagnostic information to monitor the operation on the inactivity timers on a link has been added as additional link counters. The values of these counters can be displayed with the command: RTR> sho link/counter=*_lw_* o 14-1-267 Node /isolate and link /suspect implementation faulty 8 The SET NODE qualifier /ISOLATE and SET LINK qualifer /SUSPECT have been superseded by /AUTOISOLATE. Any RTR node may disconnect a remote node if it finds that node to be unresponsive or congested. The normal behavior following such action is automatic network link reconnection and recovery. When node autoisolation is enabled on a node, it allows the node to disconnect a congested remote node in such a way that when the congested node attempts to reconnect, it receives an instruction to close all its network links and cease connection attempts. When it is in this state, the node is termed isolated. Remote node autoisolation may be enabled at the node level where it applies to all links, or for specific links only with the 'set link/autoisolate' command. An isolated node will remain in that state until the system manager performs the following actions: - enables the link to the isolated node at all nodes that have isolated it set link /enable - exit the isolated state at the isolated node set node/noisolate o 14-1-276 Quorum lab exercise in SYS MGR training gives unpredictable results In previous versions of RTR, MONITOR QUORUM would sometimes display inconsistent or confusing output when a quorum problem existed. This has now been fixed. MONITOR QUORUM now displays the state (quorate,bad_ cfg,uncertain, or not_cncted) as well as the name of the node causing the problem. The erroneous display of "CFG" as the quorum state when a role does not exist on a node has also been corrected. In addition, a new RTR command, SHOW QUORUM has been implemented which lists detailed information about the expected quorum view a node should have, and any discrepancies between the actual and expected state. 9 o 14-1-347 Transactions played out-of-order after server death After the death of a concurrent server, it could happen that some transactions that were in send state were rescheduled before other transactions on the same partition that were in voted or committed states. This has now been fixed, so that any voted or committed transactions are always rescheduled before any transactions in send state. o 14-1-375 Server stuck in lcl_rec_fail, quorum and tid problems if DECnet Phase IV only nodes in facility Network connections between nodes where one or both of the partners is employing PATHWORKS DECnet as a transport on a Microsoft Windows operating system may occasionally fail to connect with the reason "invalid msglen argument". RTR will automatically retry the connection. No user intervention is required. o 14-1-416 RTR-V2 CLI interface returns VOTE_TX completion status on DEQ_TX call In prior versions of RTR V3 an RTR V2 call made from the RTR command line could incorrectly report the completion status for some other prior call issued with the /NOWAIT qualifier. This has been corrected. o 14-1-462 MODIFY JOURNAL does nothing if a new disk is specified MODIFY JOURNAL no longer reports JOURNALMOD for disks with no journal file. Previously, it reported JOURNALMOD for each specified disk device, even for disks with no journal file, provided that no change actually failed for any other reason. This has been corrected. You will now see NOCHANGES , or DSKNOTSET . o 14-1-501 DUMP JOURNAL not counting pri_forget records; output misaligned 10 Some field test releases of RTR V3.2 would incorrectly display the number of primary-forget records in the journal as zero. This has been corrected. A minor formatting error in the statistics section of DUMP JOURNAL output has also been corrected. o 14-1-542 V2 router rejecting a V3 FE due to facility name case difference When v3 frontends try to connect to v2 routers, the frontends should send the facility name in uppercase. This requirement is needed as v2 routers store the facility name in uppercase, and getting lowercase facility name from frontend, will cause a facility name mismatch at the time of comparison. o 14-1-640 Too many network addresses can confuse RTR RTR processes could behave unpredictably following a reference to a system where the storage required to hold all configured network address information for the system exceeds the space provided by RTR. This has been corrected. A log file entry is written to warn the operator that this situation has been encountered. If encountered, check for and remove any unnecessary protocol and adapter combinations for the system concerned. o 14-3-82 Requester hangs in SYS$COMMIT_TXW with RTR V3.1D - null txn If rtr_start_tx() was called by a client followed immediately by rtr_accept_tx(), then the application would hang (unless rtr_start_tx() was called with a timeout). This has been corrected. The status returned in the rtr_mt_accepted data in such cases is RTR_STS_ SYNCHCOMM (transaction committed synchronously). This also corrects the equivalent problem with the RTR V2 API. Also, the status returned in the TXSB for such transactions using the V2 API is RTR$_SYNCHCOMM. o 14-3-93 MONITOR ACTIVE displayed wrong counts for client starts 11 The number of started transactions (used for display purposes only) was calculated incorrectly. In particular, transactions explicitly started without a transaction timeout (i.e. using rtr_start_tx() with a zero value for timoutms) were being counted twice. This caused monitor displays, e.g. MONITOR ACTIVE, to display incorrect results. This problem has been corrected. o 14-3-97 ENQFLAGS, V2 server $enq flag should ignore READONLY Any flags supplied with a $ENQ_TX call on a server channel are now ignored rather than cause an error to be returned. This maintains compatibility with prior (V2) versions. o 14-3-101 RTR$COMMIT_TX(W) returns V3 status In previous RTR V3.X releases, the txsb status returned after a successful call to sys$commit_tx() was incorrectly being set to RTR$_COMMIT rather than SS$_ NORMAL (as in version 2). This has been corrected. o 14-3-102 MODIFY JOURNAL not supported The earlier restriction that a MODIFY JOURNAL command could not be issued after RTR was started has now been lifted. However, it is now required that RTR be started for a MODIFY JOURNAL command to be executed. o 14-3-115 Set fac /BROAD=MIN=n not supported The command RTR SET FACILITY /BROADCAST=MINIMUM_RATE=n was not supported in Version 3. This problem has been corrected. o 14-3-118 Transactions on pri_act not being played on sec_act If RTR is configured with servers on backends that are running DECnet Phase V, then under certain conditions, local recovery from the remote node's journal would not be performed. For example, local and shadow recovery would appear to work correctly in a shadow server configuration after the primary shadow would go down, but in actual fact any transactions in the remote node's journal would not be recovered. This can only occur if the backends are all using DECnet Phase V as the primary RTR transport, and if the DECnet addresses of the nodes 12 concerned match a particular pattern. Note that this is a static DECnet configuration issue. If recovery works in your particular configuration, then it will always work so long as the DECnet network configuration is not changed. This has now been fixed. As a workaround for previous releases of RTR, use TCP/IP transport only, or ensure that at least one of the backends in each shadow pair uses TCP/IP as the primary protocol. o 14-3-119 No broadcasts received if evtmsk not specified In previous versions of RTR 3.X sys$dcl_tx_prc() failed to correctly check that an evtast had been supplied if evtmsk or evtnam were specified. This has now been corrected, and the original V2 behavior of returning RTR$_INVEVTAST in this situation has been restored. In addition the V2 behavior where a null evtmsk would default to rtr$m_broadcast when an evtast is specified has also been restored. o 14-3-125 Standby server stuck in lcl_rec_fail In previous versions of RTR-V3, a standby server which was attempting to take over after failure of the node which contained the previously-active server could become permanently stuck in state lcl_rec_fail. This would happen if two conditions were present: the node which had failed had not been in the same cluster as the node containing the standby, and the failed node had also been quorum-master router for the standby backend. This problem has now been fixed. o 14-3-134 Certain v2:bm counters are not present as v3:brm counters Various counters connected with the delivery of broadcast events have been added: facility counters fdb_cn_bm_transit_brd_lost and fdb_cn_bm_transit_brd_ delivered, link counters ndb_cn_bm_transit_lost and ndb_ cn_bm_transit_delivered, and process counters bm_brd_ lost and bm_brd_delivered. o 14-3-150 RTR applications hang on trying to continue after ACP restarted 13 If the application tried to open a channel again after seeing the status RTR_STS_ACPNOTVIA it could hang on the subsequent rtr_receive_message call. This problem has been corrected for threaded UNIX platforms. It is no longer necessary to restart any RTR application for UNIX after restarting RTR. o 14-3-157 V2 Response Matching Feature Enabled The V2 reply consistency check for replayed messages is enabled. RTR can can enable, disable and display this feature. Usage: To turn on: rtr> set facility FACILITY_NAME/REPLY_CHECKSUM To turn off: rtr> set facility FACILITY_NAME/NOREPLY_CHECKSUM To view flag: rtr> show facility FACILITY_NAME/CONFIG show facility FACILITY_NAME/FULL Please notice the Reply Checksum: label in the following example. A yes value indicates that the "response matching feature" is enabled. RTR> set facility miked/REPLY_CHECKSUM RTR> show facility miked/config Facilities: Facility: miked Configuration:- Frontend: yes Router: yes Backend: yes Reply Checksum: yes Router call-out: no Backend call-out: no Load balance: no Quorum-check off: no o 14-3-161 MONITOR CALLS/ID=n where n is not a valid id - monitors all ids 14 Use of the monitor command with any of the qualifers /link, /process, /facility or /partition would generate an empty display if the requested entity did not exist. This was unlike V2 behavior, and was considered by some to be misleading. V2 behavior has been restored. o 14-3-164 RTR_F_REP_INDEPENDENT flag needs to be specified even when superfluous If the server channel has been opened with RTR_F_OPE_ EXPLICIT_ACCEPT, then the RTR_F_REP_INDEPENDENT flag can only be used together with RTR_F_REP_ACCEPT. If the server channel has been opened with implicit accept, then the use of RTR_F_REP_INDEPENDENT implies the use of RTR_F_REP_ACCEPT. o 14-3-170 Terminal output is no longer unbuffered On some platforms the output stream was unbuffered by default when bound to a terminal device (the normal case), and incurred a large number of buffered I/O operations. This was noticeably inefficient when using the Rtr command interface over certain kinds of packet- based network link. This problem has been resolved. o 14-3-203 ACP crash when other nodes shutdown Configurations where more than 100 frontends were connected to any particular router may experience an ACP failure whilst managing quorum loss. This has been corrected. Automatic router failback has been restored for RTR V2 frontends connecting to RTR V3 routers. o 14-3-205 Inconsistent TR TX timeout if no link to FE Using previous versions of RTR, if a router lost a connection to a frontend that had a transaction active in enqueuing state, then the router would abort the transaction after a period of about one minute if the frontend link was not reestablished. This happened even if the client had specified a transaction timeout much less than this when starting the transaction. This is now fixed, so that a transaction in enqueueing state on the router would be aborted after the interval specified by the client (if it's less than one minute) if the router loses its connection to the frontend. 15 o 14-3-210 START RTR qualifiers from V2 Attempts to use obsolete V2 qualifiers to the START RTR command cause a warning to be issued. Qualifiers affected are partitions, cache_pages, and relations. Warnings are also generated if an OpenVMS qualifier is used on a non-OpenVMS platform. o 14-3-213 Facility names can be up to 30 characters Although the FACNAMLON message states the facility name can only have 30 characters, prior versions of RTR V3 would allow the system manager to create facilities with names as long as 31 characters. It was however not possible to open channels to such facilities. The documented maximum length of a facility name string is 30 characters. This limit is also now enforced by the 'create facility' command. The application interface symbol RTR_MAX_FACNAM_LEN found in still indicates a maximum length of 31 characters not including the string terminator at compile time, even though the system management interface in this release does not create such a facility. Applications should normally use rtr_open_ channel() to check whether a facility name has valid length and characters, and has been created. This ensures that applications do not need to be recompiled should 31 character facilities be supported at runtime in a future release. o 14-3-215 Rows dropped in monitor display with large # of rows RTR will now display up to over 1000 rows, provided the values for the /ROWS qualifiers in the relevant *.mon file are edited. This is most easily verified by redirecting the output to a file or pipe. If the output goes to a terminal, then you can use the SCROLL commands which are bound to various numeric keypad keys to scroll all except the last line of monitor output. o 14-3-219 Unthreaded applications received repeated wakeups before next RTR API call 16 RTR now suppresses additional signal-based wakeups after the first until the next RTR API call. This caused a problem for an application that issued a write() to a pipe in the wakeup handler without first checking a flag to see if a read() had occurred since the last wakeup. The pipe could fill while the application was too busy to select on it and read it until empty, at which point the next write() would block and hang the application in the signal-based wakeup handler. This has been corrected. o 14-3-224 ACP crash in 1 node config RTR was aborting if it detected a length mismatch in the message passed to it. RTR has now been changed so that if this condition is detected, diagnostic information is written to the operator log and the link disconnected. RTR will not abort. o 14-3-226 ACP crash, ncf_validate_fdbptr After a facility is deleted, the RTR ACP can receive a message from an application that references the deleted facility. Verification that the facility had been deleted failed (rarely) causing the RTR ACP to abort. This has been corrected. o 14-3-229 tx replay after reboot If there is an error deleting records from the RTR journal, an error is logged. Previously, RTR would continue without logging the error. o 14-3-239 Virtual address space full RTR tries to extend the virtual address space of the ACP if there is insufficient space to allocate data structures when a client or server application is started. If the ACP failed to do this, it would crash. This has now been corrected. Any such failure will simply prevent the new application from starting rather than crashing the ACP. o 14-3-241 Application crash trying to send large messages to unresponsive ACP 17 An application that is unable to send to the ACP due to resource shortage, for example if the ACP is alive but no longer receiving for whatever reason, now keeps trying indefinitely, and will now appear to hang rather than crash. o 14-3-250 Flow control has negative credit Applications with multiple channels engaged on more than one facility could experience flow control difficulties causing indefinite delays in transaction completion. This has been corrected. o 14-3-258 Stop inquorate standby from going active When there is a network segmentation in an active /standby configuration, the segment in the minority would become active. This behavior resulted in two active servers for the same partition. RTR now puts the inquorate or minority server in wt_quorum state and the majority server in active state. o 14-3-261 Assertion in knl_net_compare_ids() from ncf_ accept_ast2() Corruption of network messages passed during the link connect phase could cause failure of the receiving RTR ACP process. This has been corrected. o 14-3-265 Successive ACP crashes Reception of a corrupt network message could cause a failed assertion and demise of the RTR ACP process. The behavior has been changed to yield a log file entry (BADNETMSG), followed by a reset of the link concerned. If such log file entries persist for a particular pair of nodes, it may mean that a network problem exists, and you should consider checking the network hardware for correct operation. The RTR KNL subsystem log entry has also been improved to better identify the link on which it reports errors. o 14-3-266 ACP crash apparently caused by word shear of a packet. Reception of an illegal or unrecognizable broadcast now causes a log file entry (BMHDRVSN) rather than demise of the ACP process. If such entries persist you may wish to consider checking the network for correct operation. 18 o 14-3-282 Dual-ported TCP router not establishing facility links Problems can arise if nodes in your configuration have multiple network adapters and the IP name server is not configured to return all the configured IP addresses for such nodes. This causes such nodes to reply to connection requests with an ID that is different from that determined by the initiator of the connection. This can cause refused connections, or only the first connecting facility gaining a current router. This version of RTR has been changed to operate correctly in this partially configured environment. It is also now possible to provide RTR with full configuration information about hosts with multiple adapaters through alias entries in the host's database. Provide alias entries corresponding to the alternate interface addresses, and refer to these aliases when defining the primary entry. If using a host's file on UNIX, the alias entries should be defined prior to any references to them. o 14-3-289 TR failback imperfect The implementation of frontend router failback has been improved. Frontend nodes are now more likely to maintain conections with their preferred routers. On systems with multiple similarly configured facilities, this will reduce the number of networks links required and consequently resource consumption will be lowered. o 14-5-69 Journal record version control V3.2 of RTR implements some changes in the format of records in the journal. Rolling upgrades from earlier versions of RTR V3 are handled automatically, except for clustered nodes operating standby servers - these configurations must be upgraded simultaneously. Records written by RTR V3.2 cannot be used by earlier versions of RTR, so when installing an older version of RTR over RTR V3.2 you must create a new journal. o 14-5-82 Update SHOW NODE to display inactivity timer 19 The format of the SHOW NODE command has been changed to include a display of the current inactivity timer setting for the node. o 14-5-129 Improvements to netstat and connects monitor pictures The Monitor Connects and Monitor Netstat pictures now include information in their summary sections indicating if all required links are connected or not. o 14-7-751 Error handling for RTR_PREF_PROT violations Additional checks have been implemented at facility creation time to check for and report on the absence of any specified transports, required or optional name-to- address lookups on specified or available transports. Errors or warnings are issued to the terminal session, and also recorded in the RTR log file. o 14-8-131 Failure to come up in remember mode When a node in remember mode fails during recovery, it will return to remember mode. Previously a node in remember mode would undergo local shadow recovery, then shadow recovery failure, when it could not access the journal of the secondary node. RTR now knows that it was in remember mode during its recovery process and if the secondary is not available, it will return to remember mode. o 14-8-144 RTR crash when ASYNC cable disconnected Disconnecting a cable that was being used by an asynchronous DECnet link to a remote machine could cause an ACP failure when the transport marked the sockets as invalid. RTR has been changed to handle this error by temporarily suspending all network activity on the affected node. Network activity will resume as soon as the network is found to be usable again. o 14-8-162 RTRACP (V3.1C) backend crash Transaction recovery as a result of server failover could result in server applications getting hung in 'local recovery' state if it also happened that more than 10 client channels had simultaneously caused new transactions to be presented to the backend node. This has been fixed both by increasing the limit to 50 and 20 by adding a check to make sure that recovery is complete before enforcing the limit, which is designed to keep a backend node from getting overwhelmed when transactions are coming in at a rate faster than it can handle. o 14-8-181 RTR ACP Crashes RTR on a frontend could select a router as its current router immediately after that router had been trimmed from the facility. This could potentially leave the frontend in a 'connecting' state. This has been corrected. o 14-8-199 Large monitor screen last line overwritten In previous versions, RTR generated lines which were off the bottom of the physical screen. Most screens moved as far as they could to the last line when told to move off the bottom, so that you saw the last line and all subsequent lines superimposed on the bottom line. Usually the last line in the monitor file covered the others completely because none of the other lines were longer. This problem has been corrected. o 14-8-204 Inquorate router could cause backend ACP failure In previous versions of RTR, an inquorate router could send messages to backends informing them to change state at the same time that one or more quorate routers were asking them to change to or remain in a different state. In certain situations when network links are unreliable, this could sometimes cause a buildup of messages in the RTR ACP process on the backend node that caused it to fail due to lack of memory. This problem has been corrected by not allowing an inquorate router to send state update messages to backends. o 14-8-207 Concurrent timer cancellation could cause data corruption Internal RTR data could get corrupted when timers for the same events were scheduled concurrently (within the same 1 second time slice). This should have occurred only under unusual circumstances or when RTR is consistently denied access to resources due to 21 privilege constraints. RTR has been corrected to avoid this occurrence even under extreme circumstances. o 14-8-214 Potential distributed loop It was possible for a distributed loop to occur between a backend and two or more routers. The problem occurred when one router suggested the backend go into active mode, but another suggested standby. When the backend accepted the standby suggestion, it entered standby mode and broadcast its decision to the routers. This occurred regardless of whether the backend was already in standby mode (and therefore had already broadcast its status to the routers) or not. The routers would then respond to the backend with their suggestions. RTR now does not broadcast when a standby-to-standby transition occurs, as the routers will have already been informed of the backend's status. 1.3 Known Problems with Workarounds o 14-1-293 Incorrect values for /BLOCKS and /MAXIMUM_ BLOCKS RTR does not reject incorrect values for /BLOCKS and /MAXIMUM_BLOCKS of the CREATE JOURNAL or MODIFY JOURNAL commands. The workaround is to use values less than 524280 for these qualifiers. o 14-1-419 SPUJOUFIL advice to CREATE JOURNAL/SUPERSEDE is dangerous If the operator copies journal files or copies disks containing journal files without first remounting the source disk read-only, then these are SPURIOUS because RTR sees duplicates that it did not create. RTR then displays the SPUJOUFIL message, which advises the operator to use CREATE JOURNAL/SUPERSEDE to destroy the original and all copies of the journal files, and all the transactions contained in them on that node, and then submit an SPR for something that is not in fact an RTR problem. This is not the correct action in situations like this. 22 The operator should examine the log file, which shows the duplicate filenames, and then move any unwanted duplicate copies of journal files to anywhere *other* than a rtrjnl directory at the top level of a writable disk file system visible to RTR, and then try again. Only if SPUJOUFIL is caused by circumstances other than operator intervention should the operator consider making backup copies of the journal files, and only then abandoning the existing journal files and any transactions contained in them by using DELETE JOURNAL and CREATE JOURNAL, or the equivalent CREATE JOURNAL /SUPERSEDE. o 14-1-455 Last line of batch procedure sometimes ignored. The last line of a batch procedure or command file must explicitly end with added by pressing the Enter /Return key when creating the procedure. Without the explicit , RTR ignores the line. The workaround is to add a comment to the end of the file or to explicitly add to the end of the last line of the batch procedure. o 14-1-462 MODIFY JOURNAL with list of devices does not give individual error messages Although it now only lists all devices that were *successfully* modified, if some disk devices cannot be modified because they do not contain a journal file at all, then nothing at all is reported for those devices. Workaround: identify the omitted devices by comparing the command parameters and the messages, or modify them one device at a time. Verify the modification with SHOW JOURNAL /FILES /FULL. o 14-1-471 Refused network connect attempts on Windows Network connection attempts over DECnet that get explicitly refused are not handled on Windows platforms until RTR times them out. This may make failover operations slower than required for some applications. If this is the case, the timeout period can be reduced by specifying revised values using the following environment variables: 23 RTR_TIMEOUT_CONNECT (default 60 s, minimum 5 s) RTR_TIMEOUT_CONNECT_RELAX (default 90 s, minimum 1 s) Failover processing occurs after the combined values of these timers has elapsed. o 14-1-641 Servers stuck in "local_recovery_fail" state In RTR, because of the way a node shares a common journal file for all facilities on that node, if a node is configured as a backend for a given facility but is not started on that node, then servers on other nodes can get stuck in "local_recovery_fail" state. The workaround is to issue a SET PARTITION xxxx/IGNORE_ RECOVERY command to get the server going. o 14-3-207 Client application in questionable flow control when RTR journal fills There is a known issue with flow control when the journal starts filling up. There is a race condition where, if the client can send more data than can be placed in the journal before flow control kicks in, then the transaction is aborted with the correct error notification. However, if flow control kicks in first, then a deadlock occurs where the journal space never frees up and hence RTR does not allow the client to proceed with the transaction. There are two workarounds: either specify a timeout with the transaction, or increase the size of the journal. 1.4 Restrictions o 14-1-103 Using rtr_set_wakeup() in a threaded program After calling rtr_set_wakeup() in a threaded program, you should also call rtr_set_wakeup(NULL) wherever your program can exit. This will prevent any wakeups in other threads while the main thread is already running the RTR exit handler, which could lead to a server core dump when trying to stop the server. o 14-1-263 Non-English character sets are not supported for identifiers 24 The supported character set for RTR identifiers such as facility names is ASCII, with lowercase and uppercase letters equivalent. Eight bit characters are not supported because the name might not interoperate with RTR processes using a different locale or running another RTR version. o 14-3-33 Partition per facility now 500 The previous releases supported only up to 100 partitions per facility. The current release augments this to 500 but is not extendable beyond that. o 14-3-74 RTR use with multi-homed hosts When using the current version of RTR on multi-homed hosts (hosts with more than one network interface) care must be taken that the gethostbyname function returns a list of all the possible network addresses for the host. Otherwise RTR may not be able to recognise the host and will reject connections. Using a properly configured DNS will return the address list. Using the /etc/hosts file will not return the list of addresses. If a tunnel separates the FEs from the TRs, then the FEs need to be configured on the TRs with the names coresponding to the pseudoadapter addresses assigned by the tunnel. If these are unpredicatable wild cards can be used on the routers. If the tunnel separates the TR and BE nodes, configure each with respect to the other with the name prefix "tunnel." o 14-3-253 Restrictions on the RTR wakeup handler The use of rtr_reply_to_client, rtr_send_to_server, or rtr_broadcast_event in an RTR wakeup handler is not recommended. They may block when they need transaction ids or flow control. This will cause undesired behavior. Functions permitted in an rtr_set_wakeup() handler: - In an RTR wakeup handler in an AST in an unthreaded OpenVMS application, the use of rtr_reply_to_ client(), rtr_send_to_server(), rtr_broadcast_ event(), or rtr_receive_message() with a non-zero timeout is not recommended. They may block when they need transaction ids or flow control, which will 25 cause the whole application to hang until the wakeup completes. - The same rules apply in an RTR wakeup handler in a threaded application. Note that wakeups are unnecessary in a threaded paradigm, but they may be used in common code in applications that also need to run on OpenVMS. Please note that your mainline code continues to run while your wakeup is executing, so extra synchronization may be required. Also note that if the wakeup does block then it does not generally hang the whole application. - In an RTR wakeup handler in a signal in an unthreaded UNIX application, no RTR API functions and only the very few asynch-safe system and library functions may be called, because the wakeup is performed in a signal handler context. An application can write to a pipe or access a volatile sig_atomic_t variable, but using malloc() and printf(), for example, will cause unexpected failures. Alternatively, on most UNIX platforms, you can compile and link the application as a threaded application with the reentrant RTR shared library -lrtr_r. For maximum portability, the wakeup handler should do the minimum necessary to wake up the mainline event loop. You should assume that mainline code and other threads might continue to run in parallel with the wakeup, especially on machines with more than one CPU. o 14-5-86 RTR gives no warning when started with logging disabled Compaq recommends that you use RTR SET LOG to enable logging before starting RTR. Important messages are only written to the log file and are lost forever if logging is not enabled. As with any log file, the RTR log will grow with time. Log files should be located on a disk or file system where they will not interfere with operation. Should the log file become too big, you can start a new log file, and archive, compress or delete the old one. o 14-7-24 Transaction size limits 26 The number of bytes in any application message (that is, a message sent with the rtr_send_to_server(), rtr_ reply_to_client() or rtr_broadcast_event() outines) is currently restricted to 64000. The number of messages sent (that is, using rtr_send_ to_server()) in any single transaction is limited to 65534. There is no fixed limit on the number of replies (that is, sent with rtr_reply_to_client() in any single transaction. o 14-7-112 Restrictions on MODIFY JOURNAL command MODIFY JOURNAL command now works (with some care). It is no longer necessary to use DISCONNECT SERVER after MODIFY JOURNAL to commit the change, but use the SHOW JOURNAL /FULL /FILE command to verify that journal sizes have been modified as intended. The MODIFY JOURNAL command incorrectly reports success even if there is no journal on the specified device(s). RTR-S-JOURNALMOD should be taken to mean that any journals that have been found on the specified device(s) have been modified. Note that the Disk name shown by SHOW JOURNAL is not updated when journals are moved physically, or if they are indirectly located using symbolic links, virtual drives or mounts. o 14-8-149 Scrolling MONITOR and SHOW output MONITOR operates in page mode, and can be scrolled. See RTR HELP SCROLL and RTR SHOW KEY/ALL for details. There are currently no visual clues that lines are omitted or scrolled. The number of MONITOR rows is in fact limited by the /ROWS=50 qualifier in most of the *.mon files, and a hard-coded upper limit of 100 rows. The last row remains visible and is not scrolled with the rest. Unfortunately there is no workaround, because the 100 row page buffer applies even when the monitoring is redirected to a file with /OUTPUT or a pipe. These restrictions will be lifted in a forthcoming release. 27 SHOW produces text output which can be displayed in any suitable editor or scrolling window, or piped into a paging command such as 'more'. No scrolling or paging functionality for SHOW is planned within RTR. 1.5 Documentation Changes There are no corrections to existing documentation in this release. 1.6 Limitations Please see the Software Product Description. 1.7 Known Problems o If an RTR ACP process dies by any means other than the RTR STOP RTR command, Compaq strongly recommends that you immediately issue the RTR STOP RTR command to update RTR's shared memory tables. Similarly with the RTR Command Server, type RTR DISCONNECT SERVER whenever a Command Server dies in an unplanned manner. Failure to do so may cause RTR to try to connect to processes that no longer exist; this may have undesirable results. o 14-1-520 Remote commands fail with ERRACCNOD when DECnet /TCP preference mismatched Remote Commands may not work if there is a mismatch between the RTR_PREF_PROT network protocol preference environment variable on local and remote nodes. Although the name of the remote node can be prefixed with tcp. or dna. to select a protocol with which the local node contacts the remote node, this does not influence the protocol used for the return leg. If the remote node attempts to connect back to the local node using the wrong protocol, then the remote command attempt can fail with ERRACCNOD, without a more detailed entry in the log. [A more normal cause of ERRACCNOD is a lack of authorization: try simple non-RTR remote commands like rsh host date, or TYPE host::"0=procedure".] The default for the environment variable RTR_PREF_PROT is RTR_DNA_FIRST for OpenVMS nodes with DECnet, but RTR_ TCP_FIRST for other platforms. Other possible values are RTR_DNA_ONLY and RTR_TCP_ONLY. 28 1.8 Problem Reporting For problem reporting: o Send mail to your Compaq Service Representative requesting that it be forwarded to the RTR Quality Manager. o If you have any RTR log files or pertinent output from monitor pictures or RTR SHOW commands, send it to us via E-mail. o Send us as much other information as possible about the conditions which caused the failure, pointers to applications programs which caused the problem, command sequences, etc. 2 Compaq Tru64 UNIX Specific Information This chapter gives platform-specific information for the Compaq Tru64 UNIX implementation of Reliable Trasaction Router, Version 3.2. 2.1 New Features o RTR supports XA; however, problems have been found when testing with Oracle 7.34 and 8.04. Contact Oracle support for details. o 14-5-44 New script rtr_snapshot.sh for gathering RTR diagnostic data The new command rtr_snapshot.sh calls various SHOW and MONITOR commands to output a snapshot of the state of RTR on a node. This information may be of use for monitoring, tuning, troubleshooting, and reporting problems. 2.2 Known Problems Corrected Since Version 3.1D o 14-1-643 Assertion when restarting timed out command server at RTR> prompt 29 When an idle command server started by the same RTR> prompt process times out after RTR_COMSERV_ TIMEOUT seconds (default 300) and is restarted for a new command, the RTR> prompt process could raise an assertion. This problem has been corrected. o 14-3-190 Signal handling by RTR shared library in RTR applications The first RTR api call no longer replaces any existing signal handlers that were installed by the application main program for the three usual termination signals SIGINT, SIGHUP, and SIGTERM. If no existing termination signal handlers are found (SIG_DFL), RTR installs a simple handler which will cause RTR to call exit() at the next convenient opportunity during an RTR api call, or in the RTR polling thread in a threaded application. RTR installs an exit() handler with atexit(). This handler is not essential, but is intended to perform a more controlled shutdown of RTR in an application than when the process is terminated abruptly, for example with _exit(), which does not call exit handlers. The application may choose to leave the RTR termination signal handler in place, or to install its own handlers at any time. The application handlers should notify the mainline program in an async-safe manner that it should call exit() when convenient, and may even be constructed to also call the RTR handler they replaced so that the application can exit in an RTR api call too. Consult the operating system documentation for the usual restrictions on exactly what is permitted in an async-safe signal handler. If the application does not install its own signal handlers for the usual termination signals and does not continue to make regular RTR api calls, then the application will appear to ignore them. RTR still installs an empty handler to catch the SIGPIPE signal to avoid the default action of program termination. In unthreaded applications RTR may still install the RTR SIGIO handler which also executes any previous SIGIO handler installed by the main program. 30 2.3 Known Problems with Workarounds o 14-3-217 Unthreaded UNIX applications using rtr_set_ wakeup can fail, e.g., in malloc When an unthreaded UNIX RTR application calls rtr_set_ wakeup, the non-reentrant RTR shared library -lrtr with which it is linked installs a signal handler. This signal handler called functions internal to RTR which could occasionally call runtime library functions such as malloc() that are not async-safe, according to the relevant standards. See man (4) signal. In practice this may appear to work most of the time, but break for no apparent reason when the signal happens to occur while background code is also in a runtime library call such as malloc. The problem in RTR has been corrected. The small penalty for this is that RTR no longer makes any attempt to try to ensure that messages available are not just housekeeping. Applications must always be prepared for a timeout return status on calling rtr_receive_message with a zero timeout, even after a wakeup suggests that a message ought to be available. Application writers are reminded that their RTR wakeup handlers are subject to the same restrictions: routines like printf, malloc, and the entire RTR API may not be used directly or indirectly from within a signal handler. A workaround for applications with unsafe wakeup handlers can be to link with the reentrant version of the library -lrtr_r because different rules apply for wakeups in a thread: applications should not call anything that is not thread-safe, or anything that might block indefinitely, such as rtr_send_to_ server, rtr_reply_to_client, rtr_broadcast_event, or rtr_receive_message with a non-zero timeout. o 14-7-952 Treating dumb unknown terminal like a VT100 If you try to run RTR on a terminal or window with unknown (zero) dimensions, RTR exits immediately with a BADROWCOL message. 31 A workaround is to enter the following UNIX command: stty rows 24 cols 80 RTR expects the terminal or window to be at least capable of emulating a VT100 terminal. Otherwise, a few control characters are displayed at the beginning of each line, and the output from the MONITOR command contains so many control sequences that it is unreadable. A workaround is to redirect both standard input and output to files: rtr monitor calls < /dev/null > monitored_calls.lis 2.4 Restrictions o 14-1-420 RTR's use of the Trucluster Distributed Lock Manager RTR uses the Distributed Lock Manager that comes with TruCluster PS to manage access to certain system resources. Among other uses, the primary reason for locks is to coordinate access to RTR's journal. To support standby servers in a TruCluster, the RTR journal for each node must be accessible by RTR on any node in the TruCluster in case of failure of any other cluster member nodes. As part of TruCluster support, the ownership of the NFS service may failover from one node to another. RTR exploits this feature when it finds it necessary to recover transactions from another node's journal. Before RTR opens a journal, it will verify that the local node has assumed ownership of the shared disk service (as determined by the Distributed Lock Manager). This can work only if each RTR journal in a TruCluster is located on its own distinct shared disk service. o 14-3-50 Maximum number of application processes limit An ACP crash that occurred when starting the last of a great many applications has been corrected. 32 When the process open file limit is reached, the application will now generally report ACPNOTVIA, "RTR ACP is no longer a viable entity, restart RTR". In actual fact the ACP continues to operate with all previously connected processes, and only the new rejected process thinks that the RTR ACP is not alive. This message should be interpreted as "ACPINSRES, The RTR ACP has insufficient resources." Please ensure that your system is configured with sufficient default per-process resources, or that the acp process is started with increased resource limits. Allow at least one open file for each additional application process, and at least one for each link. 3 OpenVMS Specific Information This chapter gives platform-specific information for the OpenVMS implementation of Reliable Transaction Router, Version 3.2. 3.1 New Features There are no new features in this release that are specific to this platform. 3.2 Known Problems Corrected Since Version 3.1D o 14-1-170 rtr_api_wakeup_entries/exits not maintained on OpenVMS The process counters rtr_api_wakeup_entries/exits were not incremented on OpenVMS. This gave an incorrect indication of the number of wakeup calls on the "monitor calls" picture. This behavior has been corrected. o 14-1-260 Display key range bounds completely and in appropriate format Quadword signed and unsigned key ranges are supported on all Rtr platforms including OpenVMS Alpha and VAX. o 14-1-544 Non-portable VMS journals across VAX/ALPHA The incompatibility between the VAX and Alpha journal files has been corrected. Customers will have to do: rtr> CREATE JOURNAL/SUPERSEDE when they install V3.2. 33 o 14-3-53 Sys$start_txw sometimes returns 0 instead of 1 upon success ASTlm resource limitations may result in applications receiving an erroneous indication that the ACP is not available. Raising the process ASTlm quota corrected this problem. o 14-3-89 V2 field ASTPRM not in RTR$_EVT The RTR$_EVT structure, part of the v2 compatibility layer, now contains the field RTR$L_EVT_ASTPRM (as with RTR V2). The value of RTR$K_EVTAST_ARGNO has been altered accordingly (from 6 to 7). o 14-3-131 $DCL_TX_PRC crashes when the user is underprivileged Running a V2 application from an account that does not have RTR info privilege no longer causes the application to crash. o 14-3-135 RTR V3 does not select all nodes in a VMScluster when using the SET ENVIRONMENT command SET ENVIRONMENT/CLUSTER now works on OpenVMS and Windows NT. Previously, all nodes in the cluster had to be listed in a SET ENVIRONMENT /NODE=(...) command in order to issue subsequent commands to all of them. SET ENVIRONMENT /CLUSTER is now available on OpenVMS Windows and NT clusters, as well as on Compaq Tru64 UNIX TruCluster. o 14-3-169 Application not notified if ACP dies Upon the death of the ACP process, RTR V3 would incorrectly terminate any outstanding calls to the V2 API wrapper with the status ACPNOTVIA. V2 behaviour has been restored, and such calls now terminate with NOACP. o 14-3-196 Application calling $START_TX at AST level while the ACP died would cause the application to crash inside LIBRTR. This has been corrected and SYS$START_TX will simply return to the caller a message indicating that the ACP is not available. o 14-3-197 ACPNOTVIA error returned if RTR command $DCL_ TX_PRC issued 34 The RTR command $DCL_TX_PRC issued for a non-existent facility caused an ACPNOTVIA error return. This does not happen the first time - only subsequent times if RTR is stopped in between. API verbs called from the RTR command line interpreter would fail with the status ACPNOTVIA if RTR was stopped and restarted without restarting the command server. This has been corrected. The problem can be avoided on earlier vesions of RTR by issuing the command 'disconnect server' after stopping RTR. o 14-3-285 OpenVMS process quotas artificially constrained Prior versions of RTR would limit the maximum values that could be specified for the ACP process quotas to 64K. This restriction has been removed. Warning messages are generated if the requested (or default) memory quotas conflict with the system wide WSMAX parameter, or if the calculated or specified page file quota is greater than the remaining free page file space. o 14-3-286 Synchronous call to accept DECnet connect causes links to get isolated Stalling of ACP due to synchronous(sys$qiow) calls has been fixed by changing to asynchronous calls (sys$qio), which prevents the link from being disconnected. A completion event is called at the end of a successful asynchronous DECnet accept connection. Similarly, DECnet connection reject has also been fixed by changing to asynchronous calls instead of synchronous. o 14-7-640 "exceeded byte count quota" message received if process quota bytlm is less than the specified value On starting RTR in OpenVMS, if process quota bytlm is less than the specified value (e.g., currently 100000), RTR will return an OpenVMS error message "exceeded byte count quota" and will not start. Users should change the BYTLM setting to the specified value or higher to eliminate the error message and start RTR. Application users with less process quota bytlm than the specified value will receive RTR error code RTR_STS_ BYTLMNSUFF on starting their application. 35 o 14-8-130 ACCVIO and omitted parameters using Inter- Operability Services This version of the RTR Inter-Operability Services now checks the number of parameters passed. If the consumer of the API omits the trailing optional parameter(s), RTR will detect it and supply the necessary value. It is better practice to supply a "0" for the optional arguments. 3.3 Known Problems with Workarounds There are no known problems with workarounds in this release that are specific to this platform. 3.4 Restrictions o 14-1-279 RTR V2 compatibility interface is not yet thread-safe The RTR V2 compatibility interface may only be called from one program thread. o 14-3-139 RTR V3 only allows up to 30 bytes for the EVTNAM parameter The RTR V2 compatibility layer only allows up to 30 bytes for the EVTNAM parameter to $DCL_TX_PRC(W), whereas RTR V2 allows up to 32 bytes. o 14-7-625 RTR V3 cannot be run in system-mode on a machine on which RTR V2 is already running RTR V3 cannot be run in system-mode on a machine on which RTR V2 is already running. If this is attempted the RTR V3 acp process will fail. Please make sure V2 RTR has been stopped before attempting to install and run RTR V3. o 14-7-1026 Increased AST Process Quota Usage It may be necessary to increase process ASTLM quotas after upgrading from RTR V2 to V3. If your application receives a large number of messages in a relatively small time period, and you find that RTR calls are failing to complete, raise the ASTLM substantially. For example, if your process receives several hundred 36 broadcasts in a few seconds, raise ASTLM by several hundred. 4 AIX Specific Information This chapter gives platform-specific information for the AIX implementation of Reliable Transaction Router, Version 3.2. 4.1 New Features o 14-5-44 New script rtr_snapshot.sh for gathering RTR diagnostic data The new command rtr_snapshot.sh calls various SHOW and MONITOR commands to output a snapshot of the state of RTR on a node. This information may be of use for monitoring, tuning, troubleshooting, and reporting problems. 4.2 Known Problems Corrected Since Version 3.1D o 14-1-643 Assertion when restarting timed out command server at RTR> prompt When an idle command server started by the same RTR> prompt process times out after RTR_COMSERV_ TIMEOUT seconds (default 300) and is restarted for a new command, the RTR> prompt process could raise an assertion. This problem has been corrected. o 14-3-190 Signal handling by RTR shared library in RTR applications The first RTR api call no longer replaces any existing signal handlers that were installed by the application main program for the three usual termination signals SIGINT, SIGHUP, and SIGTERM. If no existing termination signal handlers are found (SIG_DFL), RTR installs a simple handler which will cause RTR to call exit() at the next convenient opportunity during an RTR api call, or in the RTR polling thread in a threaded application. 37 RTR installs an exit() handler with atexit(). This handler is not essential, but is intended to perform a more controlled shutdown of RTR in an application than when the process is terminated abruptly, for example with _exit(), which does not call exit handlers. The application may choose to leave the RTR termination signal handler in place, or to install its own handlers at any time. The application handlers should notify the mainline program in an async-safe manner that it should call exit() when convenient, and may even be constructed to also call the RTR handler they replaced so that the application can exit in an RTR api call too. Consult the operating system documentation for the usual restrictions on exactly what is permitted in an async-safe signal handler. If the application does not install its own signal handlers for the usual termination signals and does not continue to make regular RTR api calls, then the application will appear to ignore them. RTR still installs an empty handler to catch the SIGPIPE signal to avoid the default action of program termination. In unthreaded applications RTR may still install the RTR SIGIO handler which also executes any previous SIGIO handler installed by the main program. o 14-3-275 aio not available makes RTR fail with unresolved errors for kaio_rdrw, etc. RTR for AIX exploits Asynchronous I/O for increased journal performance. By default, aio is only defined, i.e., disabled, instead of available. Aio can be configured with the system management tool: # smit aio. The RTR installation procedure post_i script now makes aio available, and ensures that aio will also be available after a restart. 38 4.3 Known Problems with Workarounds o 14-3-217 Unthreaded UNIX applications using rtr_set_ wakeup can fail, e.g., in malloc When an unthreaded UNIX RTR application calls rtr_set_ wakeup, the non-reentrant RTR shared library -lrtr with which it is linked installs a signal handler. This signal handler called functions internal to RTR which could occasionally call runtime library functions such as malloc() that are not async-safe, according to the relevant standards. See man (4) signal. In practice this may appear to work most of the time, but break for no apparent reason when the signal happens to occur while background code is also in a runtime library call such as malloc. The problem in RTR has been corrected. The small penalty for this is that RTR no longer makes any attempt to try to ensure that messages available are not just housekeeping. Applications must always be prepared for a timeout return status on calling rtr_receive_message with a zero timeout, even after a wakeup suggests that a message ought to be available. Application writers are reminded that their RTR wakeup handlers are subject to the same restrictions: routines like printf, malloc, and the entire RTR API may not be used directly or indirectly from within a signal handler. A workaround for applications with unsafe wakeup handlers can be to link with the reentrant version of the library -lrtr_r because different rules apply for wakeups in a thread: applications should not call anything that is not thread-safe, or anything that might block indefinitely, such as rtr_send_to_ server, rtr_reply_to_client, rtr_broadcast_event, or rtr_receive_message with a non-zero timeout. o 14-7-952 Do not treat dumb unknown terminal like a VT100 If you try to run RTR on a terminal or window with unknown (zero) dimensions, RTR exits immediately with a BADROWCOL message. 39 A workaround is to enter the following UNIX command: stty rows 24 cols 80 RTR expects the terminal or window to be at least capable of emulating a VT100 terminal. Otherwise, a few control characters are displayed at the beginning of each line, and the output from the MONITOR command contains so many control sequences that it is unreadable. A workaround is to redirect both standard input and output to files: rtr monitor calls < /dev/null > monitored_calls.lis 4.4 Restrictions o 14-3-50 Maximum number of application processes limit An ACP crash that occurred when starting the last of a great many applications has been corrected. When the process open file limit is reached, the application will now generally report ACPNOTVIA, "RTR ACP is no longer a viable entity, restart RTR". In actual fact the ACP continues to operate with all previously connected processes, and only the new rejected process thinks that the RTR ACP is not alive. This message should be interpreted as "ACPINSRES, The RTR ACP has insufficient resources." Please ensure that your system is configured with sufficient default per-process resources, or that the acp process is started with increased resource limits. Allow at least one open file for each additional application process, and at least one for each link. 5 Sun Solaris Specific Information This chapter gives platform-specific information for the Sun Solaris implementation of Reliable Transaction Router, Version 3.2. 40 5.1 New Features o 14-5-44 New script rtr_snapshot.sh for gathering RTR diagnostic data The new command rtr_snapshot.sh calls various SHOW and MONITOR commands to output a snapshot of the state of RTR on a node. This information may be of use for monitoring, tuning, troubleshooting, and reporting problems. 5.2 Known Problems Corrected Since Version 3.1D o 14-1-643 Assertion when restarting timed out command server at RTR> prompt When an idle command server started by the same RTR> prompt process times out after RTR_COMSERV_ TIMEOUT seconds (default 300) and is restarted for a new command, the RTR> prompt process could raise an assertion. This problem has been corrected. o 14-3-190 Signal handling by RTR shared library in RTR applications The first RTR api call no longer replaces any existing signal handlers that were installed by the application main program for the three usual termination signals SIGINT, SIGHUP, and SIGTERM. If no existing termination signal handlers are found (SIG_DFL), RTR installs a simple handler which will cause RTR to call exit() at the next convenient opportunity during an RTR api call, or in the RTR polling thread in a threaded application. RTR installs an exit() handler with atexit(). This handler is not essential, but is intended to perform a more controlled shutdown of RTR in an application than when the process is terminated abruptly, for example with _exit(), which does not call exit handlers. The application may choose to leave the RTR termination signal handler in place, or to install its own handlers at any time. The application handlers should notify the mainline program in an async-safe manner that it 41 should call exit() when convenient, and may even be constructed to also call the RTR handler they replaced so that the application can exit in an RTR api call too. Consult the operating system documentation for the usual restrictions on exactly what is permitted in an async-safe signal handler. If the application does not install its own signal handlers for the usual termination signals and does not continue to make regular RTR api calls, then the application will appear to ignore them. RTR still installs an empty handler to catch the SIGPIPE signal to avoid the default action of program termination. In unthreaded applications RTR may still install the RTR SIGIO handler which also executes any previous SIGIO handler installed by the main program. o 14-3-193 Link loss after Sun Solaris 2.5.1 send (34: Result too large) Sun has confirmed that the sendmsg() system call on Sun Solaris 2.5.1 can return with an undocumented error number ERANGE "Result too large". Rtr now works around this and no longer closes the link. 5.3 Known Problems with Workarounds o 14-3-217 Unthreaded UNIX applications using rtr_set_ wakeup can fail, e.g., in malloc When an unthreaded UNIX RTR application calls rtr_set_ wakeup, the non-reentrant RTR shared library -lrtr with which it is linked installs a signal handler. This signal handler called functions internal to RTR which could occasionally call runtime library functions such as malloc() that are not async-safe, according to the relevant standards. See man (4) signal. In practice this may appear to work most of the time, but break for no apparent reason when the signal happens to occur while background code is also in a runtime library call such as malloc. 42 The problem in RTR has been corrected. The small penalty for this is that RTR no longer makes any attempt to try to ensure that messages available are not just housekeeping. Applications must always be prepared for a timeout return status on calling rtr_receive_message with a zero timeout, even after a wakeup suggests that a message ought to be available. Application writers are reminded that their RTR wakeup handlers are subject to the same restrictions: routines like printf, malloc, and the entire RTR API may not be used directly or indirectly from within a signal handler. A workaround for applications with unsafe wakeup handlers can be to link with the reentrant version of the library -lrtr_r because different rules apply for wakeups in a thread: applications should not call anything that is not thread-safe, or anything that might block indefinitely, such as rtr_send_to_ server, rtr_reply_to_client, rtr_broadcast_event, or rtr_receive_message with a non-zero timeout. o 14-7-952 Do not treat dumb unknown terminal like a VT100 If you try to run RTR on a terminal or window with unknown (zero) dimensions, RTR exits immediately with a BADROWCOL message. A workaround is to enter the following UNIX command: stty rows 24 cols 80 RTR expects the terminal or window to be at least capable of emulating a VT100 terminal. Otherwise, a few control characters are displayed at the beginning of each line, and the output from the MONITOR command contains so many control sequences that it is unreadable. A workaround is to redirect both standard input and output to files: rtr monitor calls < /dev/null > monitored_calls.lis 43 5.4 Restrictions o 14-1-6 Network Connection status codes incorrect with SUNLink DNI RTR will give a reason code when explicitly rejecting a network connection from another node. The reason text is displayed in the "monitor connects" screen, and is useful in diagnosing connectivity and configuration problems. The reject reason is not available when attempting connections using DECnet (SUNLink DNI) as a network transport. As a result, this platform incorrectly reports an explicit rejection by a remote node as having been refused. Use the "monitor accfail" screen on the target of the connection to obtain a correct indication of the reason for the rejection. o 14-3-50 Maximum number of application processes limit An ACP crash that occurred when starting the last of a great many applications has been corrected. When the process open file limit is reached, the application will now generally report ACPNOTVIA, "RTR ACP is no longer a viable entity, restart RTR". In actual fact the ACP continues to operate with all previously connected processes, and only the new rejected process thinks that the RTR ACP is not alive. This message should be interpreted as "ACPINSRES, The RTR ACP has insufficient resources." Please ensure that your system is configured with sufficient default per-process resources, or that the acp process is started with increased resource limits. Allow at least one open file for each additional application process, and at least one for each link. o 14-8-43 Sun Solaris 256 File Descriptor restriction Sun Solaris versions up to and including 2.5.1 cannot use file descriptor numbers larger than 255 for standard I/O. (Sun Solaris 2.6 is believed to address this problem.) 44 RTR no longer leaks file descriptors when attempting to use DECnet to reach a non-DECnet node while it is not currently reachable by TCP/IP, for example because RTR is stopped on that node. RTR now conserves low file descriptor numbers, so that if the per-process limit is configured to be as much as 1024, they can all be used for links and application processes. Crashes caused by this leak of a scarce resource should no longer occur. 6 HP-UX Specific Information This chapter gives platform-specific information for the HP-UX implementation of Reliable Transaction Router, Version 3.2. 6.1 New Features o 14-5-44 New script rtr_snapshot.sh for gathering RTR diagnostic data The new command rtr_snapshot.sh calls various SHOW and MONITOR commands to output a snapshot of the state of RTR on a node. This information may be of use for monitoring, tuning, troubleshooting, and reporting problems. 6.2 Known Problems Corrected Since Version 3.1D o 14-1-643 Assertion when restarting timed out command server at RTR> prompt When an idle command server started by the same RTR> prompt process times out after RTR_COMSERV_ TIMEOUT seconds (default 300) and is restarted for a new command, the RTR> prompt process could raise an assertion. This problem has been corrected. o 14-3-190 Signal handling by RTR shared library in RTR applications 45 The first RTR api call no longer replaces any existing signal handlers that were installed by the application main program for the three usual termination signals SIGINT, SIGHUP, and SIGTERM. If no existing termination signal handlers are found (SIG_DFL), RTR installs a simple handler which will cause RTR to call exit() at the next convenient opportunity during an RTR api call, or in the RTR polling thread in a threaded application. RTR installs an exit() handler with atexit(). This handler is not essential, but is intended to perform a more controlled shutdown of RTR in an application than when the process is terminated abruptly, for example with _exit(), which does not call exit handlers. The application may choose to leave the RTR termination signal handler in place, or to install its own handlers at any time. The application handlers should notify the mainline program in an async-safe manner that it should call exit() when convenient, and may even be constructed to also call the RTR handler they replaced so that the application can exit in an RTR api call too. Consult the operating system documentation for the usual restrictions on exactly what is permitted in an async-safe signal handler. If the application does not install its own signal handlers for the usual termination signals and does not continue to make regular RTR api calls, then the application will appear to ignore them. RTR still installs an empty handler to catch the SIGPIPE signal to avoid the default action of program termination. In unthreaded applications RTR may still install the RTR SIGIO handler which also executes any previous SIGIO handler installed by the main program. o 14-7-386 Better formatting for non-VT100 compatible windows and terminals Termcap entries are now parsed more carefully: :nd= is now respected. If you wish to run RTR in an hpterm instead of an xterm window, then try: 46 TERMCAP=hp:al=\EL:am:bs:cd=\EJ:ce=\EK:ch=\E&a%dC:cl=\EH\EJ:co#80: da:db:dc=\EP:dl=\EM:do=\EB:ei=\ER:kb=^H:kd=\EB:kh=\Eh:kl=\ED:kr=\EC: ku=\EA:ke=\E&s0A:ks=\E&s1A:li#24:mi:nd=\EC:pt:se=\E&d@:so=\E&dB: up=\EA:xs:cm=\E&a%dy%dC:cv=\E&a%dY:im=\EQ:ml=\El:mu=\Em:ue=\E&d@: us=\E&dD:bt=\Ei: TERM=hp 6.3 Known Problems with Workarounds o 14-3-217 Unthreaded UNIX applications using rtr_set_ wakeup can fail, e.g., in malloc When an unthreaded UNIX RTR application calls rtr_set_ wakeup, the non-reentrant RTR shared library -lrtr with which it is linked installs a signal handler. This signal handler called functions internal to RTR which could occasionally call runtime library functions such as malloc() that are not async-safe, according to the relevant standards. See man (4) signal. In practice this may appear to work most of the time, but break for no apparent reason when the signal happens to occur while background code is also in a runtime library call such as malloc. The problem in RTR has been corrected. The small penalty for this is that RTR no longer makes any attempt to try to ensure that messages available are not just housekeeping. Applications must always be prepared for a timeout return status on calling rtr_receive_message with a zero timeout, even after a wakeup suggests that a message ought to be available. Application writers are reminded that their RTR wakeup handlers are subject to the same restrictions: routines like printf, malloc, and the entire RTR API may not be used directly or indirectly from within a signal handler. A workaround for applications with unsafe wakeup handlers can be to link with the reentrant version of the library -lrtr_r because different rules apply for wakeups in a thread: applications should not call anything that is not thread-safe, or anything that might block indefinitely, such as rtr_send_to_ server, rtr_reply_to_client, rtr_broadcast_event, or rtr_receive_message with a non-zero timeout. 47 o 14-7-952 Do not treat dumb unknown terminal like a VT100 If you try to run RTR on a terminal or window with unknown (zero) dimensions, RTR exits immediately with a BADROWCOL message. A workaround is to enter the following UNIX command: stty rows 24 cols 80 RTR expects the terminal or window to be at least capable of emulating a VT100 terminal. Otherwise, a few control characters are displayed at the beginning of each line, and the output from the MONITOR command contains so many control sequences that it is unreadable. A workaround is to redirect both standard input and output to files: rtr monitor calls < /dev/null > monitored_calls.lis 6.4 Restrictions o 14-3-50 Maximum number of application processes limit An ACP crash that occurred when starting the last of a great many applications has been corrected. When the process open file limit is reached, the application will now generally report ACPNOTVIA, "RTR ACP is no longer a viable entity, restart RTR". In actual fact the ACP continues to operate with all previously connected processes, and only the new rejected process thinks that the RTR ACP is not alive. This message should be interpreted as "ACPINSRES, The RTR ACP has insufficient resources." Please ensure that your system is configured with sufficient default per-process resources, or that the acp process is started with increased resource limits. Allow at least one open file for each additional application process, and at least one for each link. 48 7 Windows NT Specific Information This chapter gives platform-specific information for Reliable Transaction Router, Version 3.2 for Windows NT. 7.1 New Features o RTR supports XA; however, problems have been found when testing with Oracle 7.34 and 8.04. Contact Oracle support for details. o 14-1-236 New RTR demo included with kit The latest RTR demo and associated application sources are now shipped on the RTR CD-ROM. The demo gives an overview of RTR functionality. The applications are written using MS Visual C++ and are provided on an unsupported basis for the benefit of the developers of applications using RTR. o 14-5-91 Windows NT Service for RTR Included with this version of RTR is the RTR\NT Service program. Installation of the software is described in the RTR Installation Guide. Operation of the software is described in the RTR System Manager's Manual. o 14-7-789 JAM locking in WNT clusters RTR configurations are supported in Windows NT cluster environments. The cluster platforms that are currently supported are Digital Clusters for Windows NT (V1.0 SP2 on Windows NT V3.51, or V1.1 on Windows NT V4.0), and Microsoft Cluster Server configurations (formerly known as Wolfpack). Only two-node NT cluster configurations are supported for this version of RTR. RTR supports the use of standby configurations in this environment. In terms of NT clusters, RTR is an application and the RTR journals are the database resource which is failed over between the NT cluster servers. The following requirements must be observed: o The RTR journal for both NT servers must be located on the same disk on the SCSI bus that is shared between the two NT cluster servers. The RTR registry 49 entry for the journal must be set to the same value on both server nodes. Furthermore, the registry entry should specify the journal disk using the path qualified by the cluster name. For example, if the cluster name is ALPHACLUSTER, and the journal disk has the cluster share name DISK1, then the RTR journal registry entry should be entered as: \\ALPHACLUSTER\DISK1 This can be modified using the Registry Editor. The registry key for the journal is found under: \HKEY_LOCAL_MACHINE\SOFTWARE\DigitalEquipmentCorporation\RTR\Journal The key name is the default (none) and value should be in the format as given above. o If the journal file is specified as above on a shared SCSI disk, then RTR can operate with standby server functionality. If the journal is not located on a shared disk in a Windows NT cluster configuration, then RTR behaves as a standalone RTR node and no use is made of cluster functionality. o RTR must be configured as both a backend and a router role on the Windows NT cluster server nodes if the journal file is located on a shared SCSI disk. o In a Windows NT cluster configuration, the RTR directory must not be located on a shared SCSI disk. o The failover group containing the disk share on which the journal files are located must have no failback policy enabled. That is, if the failover group fails over to the secondary cluster node due to primary server outage, then the group must not failback to the primary node once the primary node is available again. o While RTR facilities have been defined in a cluster configuration, then the failover group with the journal device must not be manually failed over to the other cluster server (by the cluster administrator). Failover should only occur on the discretion of the cluster failover manager software. 50 o RTR creates lock files in the RTR directory and the journal directory during normal operation. These are of the form N*.LCK or N*.BLK, and C*.LCK or C*.BLK. These files may be left in these directories after RTR has been stopped, but they will be reused once RTR is started again. There is no real need for a daemon to purge these files at system boot time. 7.2 Known Problems Corrected Since Version 3.1D o 14-1-514 Simultaneous CONNECT/EXCEPTION event generation causes W32 ACP crash Unexpected Winsock 1.1 behavior on Windows 95 when a TCP connect attempt failed could result in an RTR failure. Although this looks like a discrepancy against the documented Winsock behaviour, RTR has been modified to handle the condition and continue running. The node counters knlnet_tcp1_spurious and knlnet_tcp2_spurious track the number of times this condition is detected. o 14-3-135 RTR V3 does not select all nodes in a VMScluster when using the SET ENVIRONMENT command SET ENVIRONMENT/CLUSTER now works on OpenVMS and Windows NT. Previously, all nodes in the cluster had to be listed in a SET ENVIRONMENT /NODE=(...) command in order to issue subsequent commands to all of them. SET ENVIRONMENT /CLUSTER is now available on OpenVMS Windows and NT clusters, as well as on Digital UNIX TruCluster. o 14-3-218 Microsoft Visual C compiler options /Gz (stdcall) and /Gr (fastcall) supported The RTR API functions in are now declared with the __cdecl attribute so they can be used in applications compiled with calling conventions other than the /Gd (cdecl) default. o 14-3-255 Multiple broadcast or data received on wrong channel 51 When running W95/NT with Pathworks installed, RTR would not detect that the client had closed its channel when the client application was aborted by closing the window. RTR now detects when the client has aborted the channel and closes the channel. o 14-5-43 Exception handler report file names changed For consistency with other supported platforms, the name of the file used to hold the exception handler report has been changed to rtr_error.log. Any prior versions of the file are renamed to rtr_error.log, where n cycles through the range 0 - 9. 7.3 Known Problems with Workarounds o 14-3-62 Systems configured with Pathworks32 without DECnet In certain cases, after installing Pathworks32 without DECnet support (for example, LAT procotol only, or if want to use PowerTerm or eXcursion where DECnet not strictly needed) it may occur that DECnet happens to be registered in the Winsock2 protocol stack even though no DECnet drivers are loaded. (This may come about through some errors in the configuration procedure while, for example, removing DECnet after it has been installed.) If DECnet is registered in the Winsock2 protocol stack, but no DECnet drivers are loaded, then there is no problem with running RTR if Pathworks32 V7.0A is the version installed. If Pathworks32 V7.0 is installed, then RTR will not start correctly. If Pathworks V7.0 is installed, and RTR cannot start, then this condition can easily be verified by running WSAENUM.EXE found in the SDK directory tree of the Pathworks32 installation kit. If the address family DECnet is displayed, then DECnet has been registered in the Winsock2 protocol stack. If, at the same time, there is no DECnet protocol listed under Protocols in the Control Panel network applet, then this is the problem. 52 To allow RTR to correctly start in such a configuration, use one of the workarounds: - run PWS2DNST.EXE (Winsock2 de-register DECnet utility) found in the Pathworks32 installation kit to de-register DECnet in the Winsock2 protocol stack. - set RTR_PREF_PROT=RTR_TCP_ONLY - upgrade to Pathworks32 V7.0A 7.4 Restrictions o 14-1-155 Confusion between host and node names To fully support protocol failover between TCP/IP and DECnet on Windows NT and Windows 95 systems, the unqualified IP host name of the machine should be the same as the DECnet node name. o 14-1-160 RTR for Windows requires TCP/IP RTR for Windows NT, Windows 95, and Windows 98 requires TCP/IP protocol to be operational, since TCP/IP is used for inter-process communication between the RTR ACP and the application. If TCP/IP is removed using the Network applet in the Control Panel (for example, if DECnet has been installed), then trying to start RTR will result in a Winsock error. o 14-1-253 V3 WIN32 FE Can't connect to V2 TR There is a known deficiency for Windows clients running Pathworks DECnet trying to connect to a V2 router node. The connection does not always come up reliably at the first attempt. If this is a problem in your environment please report it to Compaq. o 14-1-471 Incorrect handling of failed DECnet connect attempts on NT Network connection attempts over DECnet that get explicitly refused are not handled on Windows platforms until RTR times them out. This may make failover operations slower than required for some applications. If this is the case, the timeout period can be reduced by specifying revised values using the following environment variables: 53 RTR_TIMEOUT_CONNECT (default 60 s, minimum 5 s) RTR_TIMEOUT_CONNECT_RELAX (default 90 s, minimum 1 s) Failover processing occurs after the combined values of these times has elasped. o 14-1-516 Excessive occurrences of ERROR WSASYSNOTREADY 10091 in rtr-log file Entries of the following type may be writen to the RTR log file as applications exit: %KNL-W-SYSTEM, ioctlsocket FIONBIO blocking, (10091: *No message text found for 10091 (317)*), knl_net.c:1780 %KNL-W-SYSTEM, ioctlsocket FIONBIO nonblocking, (10091: *No message text found for 10091 (317)*), knl_net.c:1708 These are caused by RTR trying to send data from an exit handler installed by RTR in the application. This currently does not work on Windows as the network is unavailable from an exit handler. The problem can be avoided by ensuring that the application closes all its RTR channels prior to exiting. o 14-1-594 Incomplete cleanup by COMSERV after CLI exits When executing commands from the RTR command line (CLI), if the process that opens a channel goes away, the PID associated with that channel also goes away, due to the way Windows NT and Windows 95 identify the requester. This can cause invalid channel arguments, but is normal behavior in a test or experimental environment using the CLI. o 14-5-10 Enable process dump file creation on Windows NT Using Dr Watson In the event that a problem is discovered with RTR that causes it to crash, a process dump file can be generated by enabling the Dr Watson post mortem crash analyser. This is done by entering the MS-DOS command: (%WINDIR%\drwtsn32 -i) The files that are created are %WINDIR%\DRWTSN32.LOG and %WINDIR%\USER.DMP. 54 These files should be included with any problem report submitted to RTR Engineering in the event of an RTR crash, along with the RTR dump file (RTR_.DMP) and the RTR log file. 8 Windows 95 and Windows 98 Specific Information This chapter gives platform-specific information for Reliable transaction Router, Version 3.2 for Windows 95 and Windows 98. 8.1 New Features o 14-1-236 New RTR demo included with kit The latest RTR demo and associated application sources are now shipped on the RTR CD-ROM. The demo gives an overview of RTR functionality. The applications are written using MS Visual C++ and are provided on an unsupported basis for the benefit of the developers of applications using RTR. 8.2 Known Problems Corrected Since Version 3.1D o 14-3-218 Microsoft Visual C compiler options /Gz (stdcall) and /Gr (fastcall) supported The RTR API functions in are now declared with the __cdecl attribute so they can be used in applications compiled with calling conventions other than the /Gd (cdecl) default. o 14-3-255 Multiple broadcast or data received on wrong channel When running W95/NT with Pathworks installed, RTR would not detect that the client had closed its channel when the client application was aborted by closing the window. RTR now detects when the client has aborted the channel and closes the channel. o 14-5-43 Exception handler report file names changed 55 For consistency with other supported platforms, the name of the file used to hold the exception handler report has been changed to rtr_error.log. Any prior versions of the file are renamed to rtr_error.log, where n cycles through the range 0 - 9. 8.3 Known Problems with Workarounds There are no known problems with workarounds in this release that are specific to this platform. 8.4 Restrictions o 14-1-155 Confusion between host and node names To fully support protocol failover between TCP/IP and DECnet on Windows NT and Windows 95 systems, the unqualified IP host name of the machine should be the same as the DECnet node name. o 14-1-160 RTR for Windows requires TCP/IP RTR for Windows NT, Windows 95, and Windows 98 requires TCP/IP protocol to be operational, since TCP/IP is used for inter-process communication between the RTR ACP and the application. If TCP/IP is removed using the Network applet in the Control Panel (for example, if DECnet has been installed), then trying to start RTR will result in a Winsock error. o 14-1-253 V3 WIN32 FE Can't connect to V2 TR There is a known deficiency for Windows clients running Pathworks DECnet trying to connect to a V2 router node. The connection does not always come up reliably at the first attempt. If this is a problem in your environment please report it to Compaq. o 14-1-516 Excessive occurances of ERROR WSASYSNOTREADY 10091 in rtr-log file Entries of the following type may be writen to the RTR log file as applications exit: %KNL-W-SYSTEM, ioctlsocket FIONBIO blocking, (10091: *No message text found for 10091 (317)*), knl_net.c:1780 %KNL-W-SYSTEM, ioctlsocket FIONBIO nonblocking, (10091: *No message text found for 10091 (317)*), knl_net.c:1708 56 These are caused by RTR trying to send data from an exit handler installed by RTR in the application. This currently does not work on Windows as the network is unavailable from an exit handler. The problem can be avoided by ensuring that the application closes all its RTR channels prior to exiting. o 14-1-594 Incomplete cleanup by COMSERV after CLI exits When executing commands from the RTR command line (CLI), if the process that opens a channel goes away, the PID associated with that channel also goes away, due to the way Windows NT and Windows 95 identify the requester. This can cause invalid channel arguments, but is normal behavior in a test or experimental environment using the CLI. 57