HP OpenVMS Systems Documentation

Content starts here

OpenVMS RTL Library (LIB$) Manual


Previous Contents Index


LIB$TPARSE/LIB$TABLE_PARSE

The Table-Driven Finite-State Parser routine is a general-purpose, table-driven parser implemented as a finite-state automaton, with extensions that make it suitable for a wide range of applications. It parses a string and returns a message indicating whether or not the input string is valid.

Note

No support for arguments passed by 64-bit address reference or the use of 64-bit descriptors is planned for LIB$TPARSE. On Alpha systems, LIB$TABLE_PARSE supports arguments passed by 64-bit address reference and the use of 64-bit descriptors.)

LIB$T[ABLE_]PARSE is called with the address of an argument block, the address of a state table, and the address of a keyword table. The input string is specified as part of the argument block.

The LIB$ facility supports the following two versions of the Table-Driven Finite-State Parser:

LIB$TPARSE Available on VAX systems.
  LIB$TPARSE is available on Alpha systems in translated form. In this form, it is applicable to translated VAX images only.
LIB$TABLE_PARSE Available on VAX and Alpha systems.

LIB$TPARSE and LIB$TABLE_PARSE differ mainly in the way they pass arguments to action routines.

The term LIB$T[ABLE_]PARSE is used here to describe concepts that apply to both LIB$TPARSE and LIB$TABLE_PARSE.


Format

LIB$TPARSE/LIB$TABLE_PARSE argument-block ,state-table ,key-table


RETURNS


OpenVMS usage: cond_value
type: longword (unsigned)
access: write only
mechanism: by value


Arguments

argument-block


OpenVMS usage: unspecified
type: unspecified
access: modify
mechanism: by reference

LIB$T[ABLE_]PARSE argument block. The argument-block argument contains the address of this argument block.

The LIB$T[ABLE_]PARSE argument block contains information about the state of the parse operation. It is a means of communication between LIB$T[ABLE_]PARSE and the user's program. It is passed as an argument to all action routines.

You must declare and initialize the argument block. Section 1.4 describes the argument block in detail. Section 2.2 illustrates the coding for an argument block declaration and discusses its initialization.

LIB$T[ABLE_]PARSE supports the following argument blocks:

  • A 32-bit argument block that accommodates longword addresses, values, and input tokens on both VAX and Alpha systems.
    On Alpha systems, this argument block also accommodates a numeric token whose binary representation is less than or equal to 2**64.
  • A 64-bit argument block that accommodates quadword addresses, values, and input tokens on Alpha systems.

state-table


OpenVMS usage: unspecified
type: unspecified
access: read only
mechanism: by reference

Starting state in the state table. The state-table argument is the address of this starting state. Usually, the name appearing as the first argument of the $INIT_STATE macro is used.

You must define the state table for your parser. LIB$T[ABLE_]PARSE provides macros in the MACRO and BLISS languages for this purpose. Section 1.3 describes these macros.

key-table


OpenVMS usage: unspecified
type: unspecified
access: read only
mechanism: by reference

Keyword table. The key-table argument is the address of this keyword table. This name must be the same as that which appears as the second argument of the $INIT_STATE macro.

You must only assign a name to the keyword table. The LIB$T[ABLE_]PARSE macros allocate and define the table. See Section 4 for more information about the keyword table.


Description

The following sections explain in detail how LIB$T[ABLE_]PARSE works and how to call it from both the MACRO assembly language and high-level languages:
  1. How LIB$T[ABLE_]PARSE Works --- Describes the data structures used by LIB$T[ABLE_]PARSE and how LIB$T[ABLE_]PARSE operates on them.
  2. Coding and Using a Simple State Table --- Explains how to construct and use a simple state table.
  3. Using Advanced LIB$T[ABLE_]PARSE Features --- Explains how to use subexpressions, abbreviations, action routines, and other advanced features.
  4. Data Representation --- Includes information for the low-level-language programmer, such as the binary representation of state table data.

1 How LIB$T[ABLE_]PARSE Works
LIB$T[ABLE_]PARSE analyzes an input string according to a set of states and transitions presented in a state table you define. It determines whether the input string is valid according to the rules you define for the input language.

There are three parts to any parsing operation:

  • The set of symbol types, or alphabet, from which you can choose the vocabulary of your language.
    You specify a symbol type for each transition you define. The symbol type specifies what constitutes a matching substring from the input string.
    LIB$T[ABLE_]PARSE recognizes the ASCII character set and provides symbolic names for the most common combinations of ASCII characters, such as alphabetic and alphanumeric strings, OpenVMS symbols, and numbers. See Section 1.2 for a list of the symbol types that comprise the LIB$T[ABLE_]PARSE alphabet.
  • The rules that govern how the alphabet is used---in other words, the language's grammar.
    You specify the rules for a language in a state table. A LIB$T[ABLE_]PARSE state table lists the possible states for your language. Each state consists of a list of the transitions to other states and the operations to be performed when a transition is executed (see Section 1.3 ).
  • The string to be parsed.
    The argument block specifies the input string. It also contains additional information about the state of the parse---how much of the string has not been interpreted, what the current token is, and so forth (see Section 1.4 ).

1.1 Overview
Before discussing the alphabet, the state table, and the argument block in detail, this section provides an overview of how these three parts work together.

1.1.1 Evaluating the Input String
LIB$T[ABLE_]PARSE evaluates the input string from left to right as it transitions from state to state. For a particular transition in a particular state, it evaluates the beginning of the unprocessed part of the input string against the symbol type you specify for the transition to determine whether there is a match.

LIB$T[ABLE_]PARSE compares each character of the remaining input string, from left to right, against the transition's symbol type until it encounters a character in the input string that does not match. It takes the substring that matches the symbol type and stores a pointer to it in the argument block as the current token. In this way, any character in the input string that does not belong to the symbol type's constituent character set effectively becomes a separator.

If LIB$T[ABLE_]PARSE finds a match, it executes the transition.

If the input string does not match, LIB$T[ABLE_]PARSE attempts to match the next transition. It performs the comparison using the transitions in the order in which you define them for the state.

1.1.2 Executing a Transition
When LIB$T[ABLE_]PARSE finds a match with a transition, it performs the following steps:

  1. Stores a pointer to the current token in the argument block. If the token matches one of the numeric symbol types, it also stores the token's binary representation in the argument block.
  2. Calls the action routine, if any, specified by the transition and passes it the argument block and any additional user-specified arguments.
    You can use an action routine to reject a transition. In this case, LIB$T[ABLE_]PARSE performs none of the following steps. See Section 3.1 for more information.
  3. Performs one of the following operations:
    • Stores the mask, if any, specified by the transition in the location specified by the transition.
    • Stores the value of token in the program location specified by the transition.
  4. Transfers control to the specified state, if any, or to the next state in the state table.

1.1.3 Exiting LIB$T[ABLE_]PARSE
LIB$T[ABLE_]PARSE continues to match and execute transitions from state to state until one of the following occurs:

  • For a valid match, it executes a user-specified transition to TPA$_EXIT at main level. It returns the value SS$_NORMAL.
  • A transition requests that LIB$T[ABLE_]PARSE consider the string invalid by specifying a transition to TPA$_FAIL at main level (rather than at the level of a subexpression). LIB$T[ABLE_]PARSE returns with the value LIB$_SYNTAXERR.
    You can also request a transition to TPA$_FAIL from an action routine. The action routine can provide an alternate failure status.
  • An error occurs at the main level. The error can be:
    • A syntax error. All transitions in the current state fail to match the remaining input string. LIB$T[ABLE_]PARSE returns LIB$_SYNTAXERR or an alternate failure status returned by an action routine.
    • A state table format error. One of your state table entries is invalid. LIB$T[ABLE_]PARSE returns LIB$_INVTYPE.

Note

LIB$T[ABLE_]PARSE generates no signals and establishes no condition handler; action routines can signal through LIB$T[ABLE_]PARSE back to the calling program.

When LIB$T[ABLE_]PARSE cannot successfully parse the entire string, it defines the current token, as follows, and stores it in the argument block before returning:

  • If LIB$T[ABLE_]PARSE fails to match a transition in the current state, it attempts to define the current token as the beginning of the remaining input string. You can incorporate this token in an error message or use it to determine the logical flow of your program.
    LIB$T[ABLE_]PARSE attempts to match the characters from the beginning of the remaining input string, one at a time, against the TPA$_SYMBOL alphabet symbol type until it encounters a character that does not match. The TPA$_SYMBOL symbol type consists of all the characters of the standard OpenVMS symbol constituent set.
    • If LIB$T[ABLE_]PARSE successfully matches one or more consecutive characters from the input string against TPA$_SYMBOL, then the substring that matched TPA$_SYMBOL becomes the current token.
    • If the first character of the remaining input string does not match TPA$_SYMBOL, the first character becomes the current token.
  • If LIB$T[ABLE_]PARSE matches the symbol type for a transition that specifies TPA$_FAIL as the next state, it leaves the token that matched the transition as the current token.

1.2 Alphabet of LIB$T[ABLE_]PARSE
The LIB$T[ABLE_]PARSE alphabet consists of a set of symbol types defined in Table lib-9. This alphabet includes strings made up of elements of the ASCII character set. It provides all the basic building blocks needed for constructing a grammar using the ASCII character set. The alphabet also includes symbol types that represent the more complex constructions found in programming and command language grammar.

Use the symbols types that comprise the LIB$T[ABLE_]PARSE alphabet to define a vocabulary and grammar for your language. For each transition you define, you specify one of the alphabet symbol types. LIB$T[ABLE_]PARSE compares the characters at the beginning of the remaining input string with this symbol type of each of the possible transitions. If LIB$T[ABLE_]PARSE finds a match, it enters the state specified by that transition.

Table lib-9 The Alphabet of LIB$T [ABLE_]PARSE
Symbol Type Characters Matched
' x' The particular ASCII character. In a state table, it is expressed by enclosing the character in single quotation marks. The character can be any member of the 8-bit ASCII code set. LIB$T[ABLE_]PARSE does not consider uppercase and lowercase alphabetic characters and codes with different values in bit 7 to be equivalent.
TPA$_ANY Any single character.
TPA$_ALPHA Any alphabetic character, which includes the DEC multinational character set.
TPA$_DIGIT Any numeric character, that is, 0 through 9.
TPA$_STRING Any string of one or more alphanumeric characters, that is, uppercase or lowercase A through Z, and the numeric characters 0 through 9. The string can be any length. It is bounded on the right by the first nonalphanumeric character or by the end of the string.
TPA$_SYMBOL Any string of one or more through characters of the standard OpenVMS symbol constituent set, that is, uppercase and lowercase A through Z and all DEC multinational characters, in addition to the dollar sign ($) and the underscore (_). The string is bounded on the right by some character not in the symbol constituent set (usually a blank) or by the end of the string.
' keyword' The string of characters enclosed in single quotation marks. A keyword can consist of one or more characters of the OpenVMS symbol constituent set, that is, uppercase and lowercase A through Z, the numeric characters 0 through 9, the dollar sign ($), and the underscore (_). Uppercase and lowercase alphabetics are treated as different characters.

A state table can contain up to 220 keywords. The keyword is bounded on the right by a character not in the symbol constituent set or by the end of the string.

Keywords that are one character in length are expressed in the form ' x*' to distinguish them from the single-character symbol (' x'). They must be differentiated because they are not the same in operation. For example, in the input string AB+C, the single character 'A' would match the first character of this string, whereas the keyword 'A*' would not, because B in the string is in the symbol constituent set.

TPA$_BLANK Any string of one or more blanks and/or tabs.
TPA$_OCTAL Any octal number (that is, any string of one or more numeric characters 0 through 7) whose magnitude is less than 2 32 for a 32-bit argument block or less than 2 64 for a 64-bit argument block.
TPA$_DECIMAL Any decimal number (that is, any string of one or more numeric characters 0 through 9) whose magnitude is less than 2 32 for a 32-bit argument block or less than 2 64 for a 64-bit argument block.
TPA$_HEX Any hexadecimal number (that is, any string of one or more numeric characters 0 through 9, A through F) whose magnitude is less than 2 32 for a 32-bit argument block or less than 2 64 for a 64-bit argument block.
(Alpha specific) TPA$_OCTAL_64 Any octal number (that is, any string of one or more numeric characters 0 through 7) whose magnitude is less than 2 64.
(Alpha specific) TPA$_DECIMAL_64 Any decimal number (that is, any string of one or more numeric characters 0 through 9) whose magnitude is less than 2 64.
(Alpha specific) TPA$_HEX_64 Any hexadecimal number (that is, any string of one or more numeric characters 0 through 9, A through F) whose magnitude is less than 2 64.
TPA$_FILESPEC Any string that constitutes a valid OpenVMS file specification. The string is bounded on the right by the first character that either is not a file specification constituent character or would cause the string to violate the syntax rules of a file specification.
TPA$_NODE Matches a full node specification including the double colon (::).
TPA$_NODE_ACS Matches a primary node specification including the access control string, if any, but not the double colon (::).
TPA$_NODE_PRIMARY Matches a primary node specification excluding both the access control string, if any, and the double colon (::).
TPA$_UIC Any string that constitutes a valid OpenVMS numerical UIC specification, bounded by square brackets or angle brackets. The binary value of the UIC, converted in octal radix, is placed in the argument block. The wildcard character (*) is permitted in the group and/or member fields; its presence results in that field being set to its largest possible value in the binary representation.
TPA$_IDENT Any string that constitutes a valid OpenVMS identifier. Identifiers may be given as numerical UICs according to the rules for TPA$_UIC, or as alphabetic identifier names that appear in the system's rights database. The binary value of the identifier, converted in either octal or hexadecimal radix or by lookup in the system rights database, is placed in the argument block. Identifiers can be entered in any of the following forms:
 [n,m] <n,m>

[name1,name2] <name1,name2>
[name] <name>
name
%Xhex-value
You can use a wildcard (*) in place of any occurence of number or name in an identifier form.
TPA$_LAMBDA The empty string (always matches). As it executes the transition, LIB$T[ABLE_]PARSE does not remove any characters from the input string. LAMBDA transitions are useful in getting action routines called under otherwise awkward circumstances, providing unconditional GOTOs to link portions of a state table together, and providing default actions in certain cases.
TPA$_EOS The end of the input string.
state label The label of a state that functions as a subexpression. A subexpression is analogous to a subroutine within the state table.

The subexpression facility permits complex syntactic constructs that appear in many places in grammar to appear only once in the state table. It also permits a degree of nondeterministic or pushdown parsing with a parser that is otherwise deterministic and finite-state. See Section 3.5 for detailed information about subexpressions and examples of their use.

Note

By default, LIB$T[ABLE_]PARSE treats blanks (defined to be either spaces or tabs), as though they belong to no symbol type constituent set. Effectively, this makes the blank a separator. LIB$T[ABLE_]PARSE begins its next comparison with the first nonblank character following the blanks. To have LIB$T[ABLE_]PARSE evaluate a blank as it would any other character in the input string, set the TPA$V_BLANKS flag in the argument block. Section 3.2 provides an example of the use of this flag.

1.3 State Tables
This section describes state table generation and the macros used to construct state tables. Section 2 explains how to use these macros.

The state table must be set up using either MACRO or BLISS. Everything else, including any action routines, can be coded in the language of your choice. Simply compile the state table separately, then link it with your program.

The body of the state table consists of one or more states, each of which defines one or more transitions to the same or other states. The order of the states and the order of the transitions for each state are important:

  • If a transition does not specify a target state, LIB$T[ABLE_]PARSE transitions to the next state after the current state in the state table.
  • For a given state, LIB$T[ABLE_]PARSE evaluates the input string against the transitions in the order in which they are defined and executes the first transition it matches.
    • If a state defines more than one transition with symbol types that match overlapping sets of tokens, the order of transition definitions within the state is significant. For example, the characters 123 followed by a comma (,) could match TPA$_DECIMAL, TPA$_OCTAL, TPA$_STRING, or one of several other symbol types.
    • It is best to order transitions in order of increasing generality of their symbol types. For example, the TPA$_SYMBOL symbol type matches all keyword strings. In general, LIB$T[ABLE_]PARSE never executes a keyword transition that follows a TPA$_SYMBOL transition. The symbol types, in order of increasing generality, are as follows:
      'keyword'
      'x'
      TPA$_EOS
      TPA$_ALPHA
      TPA$_DIGIT
      TPA$_BLANK
      TPA$_OCTAL
      TPA$_OCTAL_64 (Alpha only)
      TPA$_DECIMAL
      TPA$_DECIMAL_64 (Alpha only)
      TPA$_HEX
      TPA$_HEX_64 (Alpha only)
      TPA$_STRING
      TPA$_SYMBOL
      TPA$_UIC
      TPA$_IDENT
      TPA$_NODE_PRIMARY
      TPA$_NODE_ACS
      TPA$_NODE
      TPA$_FILESPEC
      TPA$_ANY
      TPA$_LAMBDA

    Note

    The list of symbol types does not include subexpression calls, because the generality of these calls depends on the symbol types recognized within the subexpression. If you use action routines to reject certain transitions, you can change the order in which that symbol type is placed in this order. In any case, LIB$T[ABLE_]PARSE executes the first transition listed in a state that you permit to match the leftmost portion of the remaining input string.

1.3.1 MACRO State Table Generation Macro Calls
The OpenVMS system MACRO library contains a set of assembler macros that allow convenient and readable coding of a LIB$T[ABLE_]PARSE state table. These macros generate symbol definitions and tables. They do not produce any executable code or routine calls.

There are four MACRO state table generation macros:

  • $INIT_STATE---Initializes the LIB$T[ABLE_]PARSE macros and declares the beginning of a state table (see Section 1.3.1.1 )
  • $STATE---Defines a state (see Section 1.3.1.2 )
  • $TRAN---Defines a state transition (see Section 1.3.1.3 )
  • $END_STATE---Ends the state table (see Section 1.3.1.4 )

A state table begins with a call to $INIT_STATE and ends with a call to $END_STATE. Within the state table, define each state by a call to $STATE immediately followed by as many calls to $TRAN as you need to define the transitions from that state.

1.3.1.1 $INIT_STATE---Initializes the LIB$T[ABLE_]PARSE Macros
The $INIT_STATE macro declares the beginning of a state table. It initializes the internals of the table generator macros and declares the locations of the state table and the keyword table:

  • The state table is the structure containing the definitions of the states and the transitions between them. LIB$T[ABLE_]PARSE builds the state table as it processes the $STATE and $TRAN macros you use to define the table.
  • The keyword table contains the text of the keywords used in the state table. LIB$T[ABLE_]PARSE builds the keyword table as it processes the calls to $TRAN for each state.

Section 4 provides specific information on the allocation and binary representations of the state table and the keyword table. This information may be useful in debugging your program.


$INIT_STATE     state-table ,key-table

state-table

The name assigned to the state table. LIB$T[ABLE_]PARSE equates this label to the start of the first state in the state table.

key-table

The name assigned to the keyword table. LIB$T[ABLE_]PARSE equates this label to the start of the keyword table.

You must supply both the address of the state table and the address of the keyword table in the call to LIB$T[ABLE_]PARSE to perform a parse. The $INIT_STATE macro can appear more than once in a program. Each occurrence defines a separate state table. No part of any state table can refer to part of any other state table.

1.3.1.2 $STATE---Defines a State
The $STATE macro declares the beginning of a state.


$STATE   [label]

label

An optional label for the state. LIB$T[ABLE_]PARSE equates the label, if present, to the starting address of the state.

1.3.1.3 $TRAN---Defines a State Transition
The $TRAN macro defines a transition from the state in which it is defined to some other (or to the same) state. The arguments of the macro define, among other things, the symbol type that causes the transition to be executed, the state to which to transfer, and the action routine to call, if any. The transition defined by a $TRAN macro belongs to the state defined by the last preceding $STATE macro.


$TRAN   type [,label] [,action] [,mask] [,msk-adr] [,argument]

type

The symbol type, taken from the LIB$T[ABLE_]PARSE alphabet, that is recognized by this transition. The transition is taken if the characters from the beginning of the remaining input string match the specified symbol type.


Previous Next Contents Index