HP OpenVMS Systems Documentation

OpenVMS RTL Library (LIB$) Manual

Assemble or compile this module as you would any other program module.

2.2 Defining the Argument Block
After you have set up the state tables, you need to declare the LIB$T[ABLE_]PARSE argument block in such a way that both your program and LIB$T[ABLE_]PARSE can use it. This means the data must be defined in an area common to the calling program and the program module containing the state table definitions.

In most programming languages you will use a combination of EXTERNAL statements and common data definitions to create and access a separate data PSECT. If you do not know what mechanisms the language you are using provides, consult the documentation for that language.

The following example shows the LIB$T[ABLE_]PARSE argument block defined for use in a BASIC program.

        !LIB$T[ABLE_]PARSE requires that TPA$K_COUNT0 be eight.

        DECLARE INTEGER CONSTANT TPA$K_COUNT0 = 8,      &
                        BTPA$L_COUNT = 0,               &
                        BTPA$L_OPTIONS=1,               &
                        BTPA$L_STRINGCNT=2,             &
                        BTPA$L_STRINGPTR=3,             &
                        BTPA$L_TOKENCNT=4,              &
                        BTPA$L_TOKENPTR=5,              &
                        BTPA$B_CHAR=6,                  &
                        BTPA$L_NUMBER=7,                &
                        BTPA$L_PARAM=8
        !+
        ! The LIB$T[ABLE_]PARSE argument block.
        !-

        MAP (TPARSE_BLOCK) LONG TPARSE_ARRAY (TPA$K_COUNT0)

        !+
        ! Redefining the map allows you to use the standard
        ! LIB$T[ABLE_]PARSE symbolic names. TPA$L_STRINGCNT,
        ! for example, references the same storage location
        ! as TPARSE_ARRAY(2) and TPARSE_ARRAY(BTPA$L_STRINGCNT).
        !-
        MAP (TPARSE_BLOCK) LONG                 &
                        TPA$L_COUNT ,           &
                        TPA$L_OPTIONS,          &
                        TPA$L_STRINGCNT,        &
                        TPA$L_STRINGPTR,        &
                        TPA$L_TOKENCNT,         &
                        TPA$L_TOKENPTR,         &
                        TPA$B_CHAR,             &
                        TPA$L_NUMBER,           &
                        TPA$L_PARAM

Before your program can call LIB$T[ABLE_]PARSE, it must place the necessary information in the argument block.

The example utility does not need to set any flags because it uses the LIB$T[ABLE_]PARSE defaults for options such as blanks processing and abbreviations. However, it must put the address and length of the string to be parsed into the TPA$L_STRINGCNT and TPA$L_STRINGPTR fields.

The address and the length of the string to be parsed are available in the descriptor of the input string (called COMMAND_LINE in the following program). However, BASIC, like most high-level languages, does not allow you to look at the descriptors of your strings. Instead, you can use LIB$ANALYZE_SDESC or LIB$ANALYZE_SDESC_64 to read the length and address from the string descriptor and place them in the argument block.

2.3 Coding the Call to LIB$T[ABLE_]PARSE
The following example demonstrates calling LIB$T[ABLE_PARSE from a high-level language (BLISS). This program uses the BLISS state table described in Section 2.1.2 .

5 %TITLE "BLISS Program to Call LIB$T[ABLE_]PARSE


        OPTION TYPE=EXPLICIT

        !+
        ! COMMAND_LINE is the string to receive the input
        !   command from the terminal.
        ! ERROR_MSG_TEXT is the system error message
        !  returned from LIB$SYS_GETMSG
        !  (used in the error handling routine)
        !-
        DECLARE STRING COMMAND_LINE, ERROR_MSG_TEXT

        !+
        ! RET_STATUS receives the status from the system calls.
        ! SAVE_STATUS is used when an error occurs
        !   and the error handling routine calls
        !   LIB$SYS_GETMSG to obtain the error text.
        !-
        DECLARE LONG RET_STATUS, SAVE_STATUS

        !+
        ! UFD_STATE is the address of the state table.
        ! UFD_KEY is the address of the key table.
        ! Both addresses are set up by the macros in module
        !    SIMPLE_STATETABLE32.
        !-

        EXTERNAL LONG UFD_STATE, UFD_KEY

        !+
        ! To allow us to compare returned statuses more easily.
        !-

        EXTERNAL INTEGER CONSTANT SS$_NORMAL,   &
                LIB$_SYNTAXERR,                 &
                LIB$_INVTYPE

        !+
        ! This program calls the following Run-Time Library
        !  routines:
        !
        ! LIB$T[ABLE_]PARSE to parse the input string
        !
        ! LIB$ANALYZE_SDESC to get the length and starting
        !    address of the command string and place them
        !    in the LIB$T[ABLE_]PARSE argument block.
        !
        ! LIB$SYS_GETMSG to find the facility, severity, and text
        !    of any system errors that occur
        !    during program execution.
        !-

        EXTERNAL LONG FUNCTION LIB$TABLE_PARSE,     &
                               LIB$ANALYZE_SDESC,   &
                               LIB$SYS_GETMSG

        !+
20      ! This file defines the argument block that is passed
        !   to LIB$T[ABLE_]PARSE. It also defines subscripts that
        !    make it easier to access the array.
        !
        ! Keeping the argument block definitions in a separate
        !   file makes them easier to modify and lets other
        !   programs use the same definitions.
        !-

        %INCLUDE "SIMPLE_TPARSE_BLOCK"


50      ON ERROR GOTO ERROR_HANDLER


60      !+
        !  LIB$T[ABLE_]PARSE requires that TPA$L_COUNT, the
        !  first field in the argument block, have a value
        !  of TPA$K_COUNT0, whose value is 8.
        !-

        TPA$L_COUNT = TPA$K_COUNT0


75      !+
        ! Prompt at the terminal for the user's action.
        ! A real utility should provide a friendlier,
        ! clearer interface.
        !-

        GET_INPUT:      PRINT "Your options are: " , " READ report "
                        PRINT , " FILE report "
                        PRINT , " PRINT report "
                        PRINT , " CREATE report "
                        PRINT
                        INPUT "What would you like to do"; COMMAND_LINE
        !+
        ! Get the length and starting address of the command line
        ! and place them in the LIB$T[ABLE_]PARSE argument block. Note
        ! that LIB$ANALYZE_SDESC stores the length as a word.
        !-

        RET_STATUS = LIB$ANALYZE_SDESC (COMMAND_LINE BY DESC, &
                   TPARSE_ARRAY (BTPA$L_STRINGCNT) BY REF,    &
                   TPARSE_ARRAY (BTPA$L_STRINGPTR) BY REF)

        IF RET_STATUS <> SS$_NORMAL THEN
                        GOTO ERROR_HANDLER
        END IF


100     !+
        ! Call LIB$T[ABLE_]PARSE to process the input string.
        !
        ! Note that LIB$T[ABLE_]PARSE expects to receive its arguments
        ! by reference, while BASIC's default for arrays and
        ! strings is by descriptor. Therefore the BY REF
        ! clauses are required. Without them, LIB$T[ABLE_]PARSE
        ! cannot find the input string
        ! and the parse will always fail.
        !-

        RET_STATUS = LIB$TABLE_PARSE (TPARSE_ARRAY () BY REF, &
                     UFD_STATE BY REF,                   &
                     UFD_KEY BY REF )
        !+
        ! This simple program provides no information except that
        ! a valid command was entered. The next section discusses
        ! techniques for gathering more information.
        !-

        IF RET_STATUS = SS$_NORMAL

        !+
        ! For now, exit on success.
        !-

                THEN PRINT "Parse successful"
                        GOTO 9999
        !+
        ! If the parse failed, give the user a chance to try again.
        !-

                ELSE IF RET_STATUS = LIB$_SYNTAXERR THEN
                        PRINT   "You did not enter a valid command."
                        PRINT "Please try again."
                        GOTO GET_INPUT

 !+
        ! If a more serious error occurred, inform the user
        ! and exit.
 !-

                ELSE
                        Goto ERROR_HANDLER
                END IF
        END IF

500     ERROR_HANDLER: SAVE_STATUS = RET_STATUS

        RET_STATUS = LIB$SYS_GETMSG (SAVE_STATUS,,ERROR_MSG_TEXT)
        PRINT "Something went wrong."
        PRINT ERL, ERROR_MSG_TEXT
        RESUME 9999

9999    END

Compile this program as you would any other BASIC program.

When both the state tables and the main program have been compiled, link them together to form a single executable image, as follows:

$ LINK SIMPLANG,SIMPLANG_STATETABLE

3 Using Advanced LIB$T[ABLE_]PARSE Features
The LIB$T[ABLE_]PARSE call in the previous program tells you that the command the user entered was valid, but nothing else---not even which command was entered. A program usually needs more information than this.

The following sections describe some of the more complicated ways to process input strings or to gather extra information for your program, including:

Action routines (see 3.1 )
Blanks in the input string (see 3.2 )
Special characters in the input string (see 3.3 )
Abbreviated keywords (see 3.4 )
Subexpressions (see 3.5 )
Modular use of LIB$T[ABLE_]PARSE (see 3.6 )

3.1 Using Action Routines
After LIB$T[ABLE_]PARSE finds a match between a transition and the leading portion of the input string, it determines if the transition that made the match specified an action routine. If it did, LIB$T[ABLE_]PARSE stores the value of the transition's argument longword, if any, in the argument block PARAM field and calls the action routine.

If the action routine returns success, LIB$T[ABLE_]PARSE processes the mask or msk-adr arguments, if any, and continues to execute the transition as it would if there was no action routine.
If the action routine returns failure, LIB$T[ABLE_]PARSE does not execute the transition and continues attempting to match successive transitions.

3.1.1 Passing Data to an Action Routine
An action routine has only one argument, the argument block. You can pass additional data to the action routine using:

The transition's optional argument argument
Fields you add to the end of the argument block

LIB$TABLE_PARSE and LIB$TPARSE use different linkages for passing the argument block to the action routine:

LIB$TABLE_PARSE uses the standard calling mechanism and passes the argument block, by reference, as the only argument to the action routine.
Therefore, for OpenVMS systems, action routines are written as:
ROUTINE TEST( TPARSE_ARGUMENT_BLOCK : REF BLOCK[ , BYTE ] ) = BEGIN TPARSE_ARGUMENT_BLOCK[ TPA$V_ABBREV ] = 1 END;
On VAX systems, LIB$TPARSE uses a nonstandard linkage that establishes the address of the argument block as the routine's actual argument pointer. Therefore an action routine can reference fields in the argument block by their symbolic offsets relative to the AP (argument pointer) register.
For example:
ROUTINE TEST = BEGIN BUILTIN AP; BIND TPARSE_ARGUMENT_BLOCK = AP : REF BLOCK[ , BYTE ]; TPARSE_ARGUMENT_BLOCK[ TPA$V_ABBREV ] = 1 END;

3.1.2 Action Routine Return Values
The action routine returns a value to LIB$T[ABLE_]PARSE in R0 that controls execution of the current state transition. If the action routine returns success (low bit set in R0) then LIB$T[ABLE_]PARSE proceeds with the execution of the state transition. If the action routine returns failure (low bit clear in R0), LIB$T[ABLE_]PARSE rejects the transition that was being processed and acts as if the symbol type of that transition had not matched. It proceeds to evaluate other transitions in that state for eligibility.

Note

Prior to calling an action routine, LIB$T[ABLE_]PARSE sets the low bit of R0 to make it easier for the action routine to return success.

If an action routine returns a nonzero failure status to LIB$T[ABLE_]PARSE and no subsequent transitions in that state match, LIB$T[ABLE_]PARSE will return the status of the action routine, rather than the status LIB$_SYNTAXERR. In longword-valued functions in high-level languages, this value is returned in R0.

3.1.3 Using an Action Routine to Reject a Transition
An action routine can intentionally return a failure status to force LIB$T[ABLE_]PARSE to reject a transition. This allows you to implement symbol types specific to particular applications. To recognize a specialized symbol type, code a state transition using a LIB$T[ABLE_]PARSE symbol type that describes a superset of the desired set of possible tokens. The associated action routine then performs the additional discrimination necessary and returns success or failure to LIB$T[ABLE_]PARSE, which then accordingly executes or fails to execute the transition.

A pure finite-state machine, for instance, has difficulty recognizing strings that are shorter than some maximum length or accepting numeric values confined to some particular range.

3.2 Blanks in the Input String
The default mode of operation in LIB$T[ABLE_]PARSE is to treat blanks as separators. That is, they can appear between any two tokens in the string being parsed without being called for by transitions in the state table. Because blanks are significant in some situations, LIB$T[ABLE_]PARSE processes blanks if you have set the bit TPA$V_BLANKS in the options longword of the argument block. The following input string shows the difference in operation:

ABC  DEF

LIB$T[ABLE_]PARSE recognizes the string by the following sequences of state transitions, depending on the state of the blanks control flag. The following examples illustrate processing with and without TPA$V_BLANKS set:

TPA$V_BLANKS set:

$STATE
$TRAN TPA$_STRING

$STATE
$TRAN   TPA$_BLANK

$STATE
$TRAN   TPA$_STRING

TPA$V_BLANKS clear:

$STATE
$TRAN   TPA$_STRING

$STATE
$TRAN   TPA$_STRING

Your action routines can set or clear TPA$V_BLANKS as LIB$T[ABLE_]PARSE enters or leaves sections of the state table in which blanks are significant. LIB$T[ABLE_]PARSE always checks the blanks control flag as it enters a state. If the flag is clear, it removes any space or tab characters present at the front of the input string before it proceeds to evaluate transitions. Note that when the TPA$V_BLANKS flag is clear, the TPA$_BLANK symbol type will never match. If TPA$V_BLANKS is set, you must explicitly process blanks.

3.3 Special Characters in the Input String
Not all members of the ASCII character set can be entered directly in the state table definitions. Examples include the single quotation mark and all control characters.

In MACRO state tables, such characters can be specified as the symbol type with any assembler expression that is equivalent to the ASCII code of the desired character, not including the single quotes. For example, you could code a transition to match a backspace character as follows:

BACKSPACE = 8
   .
   .
   .
$TRAN BACKSPACE, ...

MACRO places extra restrictions on the use of a comma in arguments to macros; often they must be surrounded by one or more angle brackets. Using a symbolic name for the comma will avoid such difficulties.

To build a transition matching such a single character in a BLISS state table, you can use the %CHAR lexical function as follows:

LITERAL BACKSPACE = 8;
   .
   .
   .
$STATE (label,
       (%CHAR (BACKSPACE), ... )
        );

3.4 Abbreviating Keywords
The default mode of LIB$T[ABLE_]PARSE is exact match. All keywords in the input string must exactly match their spelling, length and case in the state table. However, many languages (command languages in particular) allow you to abbreviate keywords. For this reason, LIB$T[ABLE_]PARSE has three abbreviation facilities to permit the recognition of abbreviated keywords when the state table lists only the full spellings. All three are controlled by flags and options defined in the argument block OPTIONS field. Table lib-11 describes these flags.

**Table lib-11 Keyword Abbreviation Flags**
Flag	Description
TPA$B_MCOUNT TPA64$B_MCOUNT	By setting a value in the MCOUNT argument block field, the calling program or action routine specifies a minimum number of characters from the abbreviated keyword that must be present for a match to occur. For example, setting the byte to the value 4 would allow the keyword DEASSIGN to appear in an input string as DEAS, DEASS, DEASSI, DEASSIG, or DEASSIGN. LIB$T[ABLE_]PARSE checks all the characters of the keyword string. Incorrect spellings beyond the minimum abbreviation are not permitted.
TPA$V_ABBRFM TPA64$V_ABBRFM	If you set the ABBRFM flag in the argument block OPTIONS field, LIB$T[ABLE_]PARSE recognizes any leftmost substring of a keyword as a match for that keyword. LIB$T[ABLE_]PARSE does not check for ambiguity; it matches the first keyword listed in the state table of which the input token is a subset. For proper recognition of ambiguous keywords, the keywords in each state must be arranged in alphabetical order by the ASCII collating sequence as follows: Dollar sign ($) Numerics Uppercase alphabetics Underscore (_) Lowercase alphabetics
TPA$V_ABBREV TPA64$V_ABBREV	If you set the ABBREV flag in the argument block OPTIONS field, LIB$T[ABLE_]PARSE recognizes any abbreviation of a keyword as long as it is unambiguous among the keywords in that state. If LIB$T[ABLE_]PARSE finds that the front of the input string contains an ambiguous keyword string, it sets the AMBIG flag in the OPTIONS field and refuses to recognize any keyword transitions in that state. (It still accepts other symbol types.) The AMBIG flag can be checked by an action routine that is called when coming out of that state, or by the calling program if LIB$T[ABLE_]PARSE returns with a syntax error status. LIB$T[ABLE_]PARSE clears the flag when it enters the next state.
If both the ABBRFM and ABBREV flags are set, ABBRFM takes precedence.

Note

Using a keyword abbreviation option can permit short abbreviations or ambiguity, which restricts the extensibility of a language. Adding a new keyword can make a formerly valid abbreviation ambiguous.

3.5 Using Subexpressions
LIB$T[ABLE_]PARSE subexpressions are analogous to subroutines within the state table. You can use subexpressions as you would use subroutines in any program:

To avoid replication of complex expressions.
For a limited form of pushdown parsing, in which the state table contains recursively nested subexpressions.
For nondeterministic parsing, that is, parsing in which you need some number of states of look-ahead. To do this, place each path of look-ahead in a separate subexpression and call the subexpressions in the transitions of the state that needs the look-ahead. When a look-ahead path fails, the subexpression failure mechanism causes LIB$T[ABLE_]PARSE to back out and try another path.

A subexpression call is indicated with the MACRO expression !label or the BLISS expression (label) as the transition type argument. Transfer of control to a subexpression causes LIB$T[ABLE_]PARSE to call itself recursively, using the same argument block and keyword table as the original call, and using the specified state label as a starting state.

The following statement is an example of a $TRAN macro that calls a subexpression:

$TRAN !Q_STRING,,,,Q_DESCRIPTOR

In this example, Q_STRING is the label of another state, a subexpression, in the same state table.

When LIB$T[ABLE_]PARSE evaluates a transition that transfers control to a subexpression, it evaluates the subexpression's transitions and processes the remaining input string.

If the subexpression succeeds, it returns success to LIB$T[ABLE_]PARSE by executing a transition to TPA$_EXIT. LIB$T[ABLE_]PARSE thus considers the calling transition to have made a match. It calls that transition's action routine, if any, and executes the transition.
If the subexpression fails, LIB$T[ABLE_]PARSE considers the calling transition to have no match. It backs up the input string, leaving it as it was at the start of the subexpression, and continues processing by evaluating the remaining transitions in the calling state.

3.5.1 Using Action Routines and Storing Data in a Subexpression
Be careful when designing subexpressions whose transitions provide action routines or use the mask and msk-adr arguments. As LIB$T[ABLE_]PARSE processes the state transitions of a subexpression, it calls the specified action routines and stores the mask and msk-adr. If the subexpression fails, LIB$T[ABLE_]PARSE backs up the input string and resumes processing in the calling state. However, any effect that an action routine has had on the caller's database cannot be undone.

If subexpressions are used only as state table subroutines, there is usually no harm done, because when a subexpression fails in this mode, the parse generally fails. This is not true of pushdown or nondeterministic parsing. In applications where you expect subexpressions to fail, design action routines to store results in temporary storage. You can then make these results permanent at the main level, where the flow of control is deterministic.

3.5.2 An Example: Parsing a Quoted String
The following example is an excerpt of a state table that parses a string quoted by an arbitrary character. The table interprets the first character that appears as a quote character. Many text editors and some programming languages contain this sort of construction.

LIB$T[ABLE_]PARSE processes a transition that invokes a subexpression as it would any other transition:

If the subexpression returns success by executing a transition to TPA$_EXIT, LIB$T[ABLE_]PARSE considers the calling transition to have a match. It updates Q_DESCRIPTOR to describe the substring parsed by the subexpression and executes the transition to the next state in the state table.
If the subexpression returns failure by executing a transition to TPA$_FAIL, LIB$T[ABLE_]PARSE considers the calling transition to have no match. It restores the input string as it was when the subexpression was called and continues by evaluating the next transition in the state.

;+
; Main level state table. The first transition accepts and
; stores the quoting character.
;-
     $STATE    STRING
     $TRAN     TPA$_ANY,,,,Q_CHAR
;+
; Call the subexpression to accept the quoted string and store
; the string descriptor. Note that the descriptor spans all
; the characters accepted by the subexpression.
;-
     $STATE
     $TRAN     !Q_STRING,,,,Q_DESCRIPTOR
     $TRAN     TPA$_LAMBDA,TPA$_FAIL
;+
; Accept the trailing quote character, left behind by the
; subexpression
;-
     $STATE
     $TRAN     TPA$_ANY,NEXT
;+
; Subexpression to scan the quoted string. The second transition
; matches until it is rejected by the action routine. The subexpression
; should never encounter the end of string before the final quoting
; character.
;-
     $STATE     Q_STRING
     $TRAN     TPA$_EOS,TPA$_FAIL
     $TRAN     TPA$_ANY,Q_STRING,TEST_Q
     $TRAN     TPA$_LAMBDA,TPA$_EXIT
;+
; The following MACRO subroutine compares the current character
; with the quoting character and returns failure if it matches.
;-

TEST_Q: .WORD     0                     ; null entry mask
        CMPB      TPA$B_CHAR(AP),Q_CHAR ; check the character
        BNEQ      10$                   ; note R0 is already 1
        CLRL      R0                    ; match - reject transition
10$:    RET

3.5.3 An Example: Parsing a Complex Grammar
The following example is an excerpt from a state table that shows how to use subexpressions to parse a complex grammar. The state table accepts a number followed by a keyword qualifier. Depending on the keyword, the table interprets the number as decimal, octal, or hexadecimal. The state table accepts strings such as the following:

10/OCTAL
32768/DECIMAL
77AF/HEX

This sort of grammar is difficult to parse with a deterministic finite-state machine. Using a subexpression look-ahead of two states permits a simpler expression of the state tables.

;+
; Main state table entry. Accept a number of some type and store
; its value at the location NUMBER.
;-
     $STATE
     $TRAN     !OCT_NUM,NEXT,,,NUMBER
     $TRAN     !DEC_NUM,NEXT,,,NUMBER
     $TRAN     !HEX_NUM,NEXT,,,NUMBER
;+
; Subexpressions to accept an octal number followed by the OCTAL
; qualifier.
;-
     $STATE     OCT_NUM
     $TRAN      TPA$_OCTAL
     $STATE
     $TRAN      '/'
     $STATE
     $TRAN      'OCTAL',TPA$_EXIT
;+
; Subexpression to accept a decimal number followed by the DECIMAL
; qualifier.
;-
     $STATE     DEC_NUM
     $TRAN      TPA$_DECIMAL
     $STATE
     $TRAN      '/'
     $STATE
     $TRAN      'DECIMAL',TPA$_EXIT
;+
; Subexpression to accept a hex number followed by the HEX
; qualifier.
;-
     $STATE     HEX_NUM
     $TRAN      TPA$_HEX
     $STATE
     $TRAN      '/'
     $STATE
     $TRAN      'HEX',TPA$_EXIT

Note that the transitions that follow a match with a numeric token do not disturb the NUMBER field in the argument block. This allows the main level subexpression call to retrieve it when the subexpression returns.

3.6 LIB$T[ABLE_]PARSE and Modularity
To use LIB$T[ABLE_]PARSE in a modular and shareable fashion:

Avoid using OWN storage. Instead, allocate the argument block on the stack or the heap.
Do not use the msk-adr argument.
Do not use the argument argument as an address. If additional context is needed, allocate it at the end of the argument block.
Use action routines to control flags such as TPA$V_BLANKS. The MACRO example at the end of the LIB$TPARSE/LIB$TABLE_PARSE section shows such an action routine, though the program itself is not modular.

Contents

Index