|  | OpenVMS RTL Library (LIB$) Manual
 
 
 LIB$TPARSE/LIB$TABLE_PARSE
 
The Table-Driven Finite-State Parser routine is a general-purpose,
table-driven parser implemented as a finite-state automaton, with
extensions that make it suitable for a wide range of applications. It
parses a string and returns a message indicating whether or not the
input string is valid.
 
  | Note No support for arguments passed by 64-bit address reference or the use
of 64-bit descriptors is planned for LIB$TPARSE. On Alpha systems,
LIB$TABLE_PARSE supports arguments passed by 64-bit address reference
and the use of 64-bit descriptors.)
 |  
LIB$T[ABLE_]PARSE is called with the address of an argument block, the
address of a state table, and the address of a keyword table. The input
string is specified as part of the argument block.
 
The LIB$ facility supports the following two versions of the
Table-Driven Finite-State Parser:
 
  
    | LIB$TPARSE | Available on VAX systems. |  
    |  | LIB$TPARSE is available on Alpha systems in translated form. In this
      form, it is applicable to translated VAX images only. |  
    | LIB$TABLE_PARSE | Available on VAX and Alpha systems. |  
LIB$TPARSE and LIB$TABLE_PARSE differ mainly in the way they pass
arguments to action routines.
 
The term LIB$T[ABLE_]PARSE is used here to describe concepts that apply
to both LIB$TPARSE and LIB$TABLE_PARSE.
 
 Format
LIB$TPARSE/LIB$TABLE_PARSE argument-block ,state-table ,key-table
 
 RETURNS
 
  
    | OpenVMS usage: | cond_value |  
    | type: | longword (unsigned) |  
    | access: | write only |  
    | mechanism: | by value |  
 
 Arguments
argument-block
 
  
    | OpenVMS usage: | unspecified |  
    | type: | unspecified |  
    | access: | modify |  
    | mechanism: | by reference |  LIB$T[ABLE_]PARSE argument block. The argument-block
argument contains the address of this argument block.
 
The LIB$T[ABLE_]PARSE argument block contains information about the
state of the parse operation. It is a means of communication between
LIB$T[ABLE_]PARSE and the user's program. It is passed as an argument
to all action routines.
 
You must declare and initialize the argument block. Section
1.4  describes the argument block in detail. Section
2.2  illustrates the coding for an argument block declaration
and discusses its initialization.
 
LIB$T[ABLE_]PARSE supports the following argument blocks:
 
  A 32-bit argument block that accommodates longword addresses,
  values, and input tokens on both VAX and Alpha systems. On Alpha
  systems, this argument block also accommodates a numeric token whose
  binary representation is less than or equal to 2**64.
A 64-bit argument block that accommodates quadword addresses,
  values, and input tokens on Alpha systems.
 state-table
 
  
    | OpenVMS usage: | unspecified |  
    | type: | unspecified |  
    | access: | read only |  
    | mechanism: | by reference |  Starting state in the state table. The state-table
argument is the address of this starting state. Usually, the name
appearing as the first argument of the $INIT_STATE macro is used.
 
You must define the state table for your parser. LIB$T[ABLE_]PARSE
provides macros in the MACRO and BLISS languages for this purpose.
Section 1.3  describes these macros.
 key-table
 
  
    | OpenVMS usage: | unspecified |  
    | type: | unspecified |  
    | access: | read only |  
    | mechanism: | by reference |  Keyword table. The key-table argument is the address
of this keyword table. This name must be the same as that which appears
as the second argument of the $INIT_STATE macro.
 
You must only assign a name to the keyword table. The LIB$T[ABLE_]PARSE
macros allocate and define the table. See Section 4  for
more information about the keyword table.
 
 Description
The following sections explain in detail how LIB$T[ABLE_]PARSE works
and how to call it from both the MACRO assembly language and high-level
languages:
   How LIB$T[ABLE_]PARSE Works --- Describes the data structures used
  by LIB$T[ABLE_]PARSE and how LIB$T[ABLE_]PARSE operates on them.
   Coding and Using a Simple State Table --- Explains how to
  construct and use a simple state table.
   Using Advanced LIB$T[ABLE_]PARSE Features --- Explains how to use
  subexpressions, abbreviations, action routines, and other advanced
  features.
   Data Representation --- Includes information for the
  low-level-language programmer, such as the binary representation of
  state table data.
 
1 How LIB$T[ABLE_]PARSE Works
LIB$T[ABLE_]PARSE analyzes an input string according to a set of states
and transitions presented in a state table you define. It determines
whether the input string is valid according to the rules you define for
the input language.
 
There are three parts to any parsing operation:
 
  The set of symbol types, or alphabet, from which
  you can choose the vocabulary of your language. You specify a
  symbol type for each transition you define. The symbol type specifies
  what constitutes a matching substring from the input string.
 LIB$T[ABLE_]PARSE recognizes the ASCII character set and provides
  symbolic names for the most common combinations of ASCII characters,
  such as alphabetic and alphanumeric strings, OpenVMS symbols, and
  numbers. See Section 1.2  for a list of the symbol types that
  comprise the LIB$T[ABLE_]PARSE alphabet.
The rules that govern how the alphabet is used---in other words,
  the language's grammar. You specify the rules for a language in a
  state table. A LIB$T[ABLE_]PARSE state table lists the possible states
  for your language. Each state consists of a list of the transitions to
  other states and the operations to be performed when a transition is
  executed (see Section 1.3 ).
The string to be parsed. The argument block specifies the input
  string. It also contains additional information about the state of the
  parse---how much of the string has not been interpreted, what the
  current token is, and so forth (see Section 1.4 ).
 
1.1 Overview
Before discussing the alphabet, the state table, and the argument block
in detail, this section provides an overview of how these three parts
work together.
 
1.1.1 Evaluating the Input String
LIB$T[ABLE_]PARSE evaluates the input string from left to right as it
transitions from state to state. For a particular transition in a
particular state, it evaluates the beginning of the unprocessed part of
the input string against the symbol type you specify for the transition
to determine whether there is a match.
 
LIB$T[ABLE_]PARSE compares each character of the remaining input
string, from left to right, against the transition's symbol type until
it encounters a character in the input string that does not match. It
takes the substring that matches the symbol type and stores a pointer
to it in the argument block as the current token. In
this way, any character in the input string that does not belong to the
symbol type's constituent character set effectively becomes a separator.
 
If LIB$T[ABLE_]PARSE finds a match, it executes the transition.
 
If the input string does not match, LIB$T[ABLE_]PARSE attempts to match
the next transition. It performs the comparison using the transitions
in the order in which you define them for the state.
 
1.1.2 Executing a Transition
When LIB$T[ABLE_]PARSE finds a match with a transition, it performs the
following steps:
 
  Stores a pointer to the current token in the argument block. If the
  token matches one of the numeric symbol types, it also stores the
  token's binary representation in the argument block.
  Calls the action routine, if any, specified by the transition and
  passes it the argument block and any additional user-specified
  arguments. You can use an action routine to reject a transition. In
  this case, LIB$T[ABLE_]PARSE performs none of the following steps. See
  Section 3.1  for more information.
Performs one of the following operations:
  
    Stores the mask, if any, specified by the transition in the
    location specified by the transition.
    Stores the value of token in the program location specified by the
    transition.
  Transfers control to the specified state, if any, or to the next
  state in the state table.
 
1.1.3 Exiting LIB$T[ABLE_]PARSE
LIB$T[ABLE_]PARSE continues to match and execute transitions from state
to state until one of the following occurs:
 
  For a valid match, it executes a user-specified transition to
  TPA$_EXIT at main level. It returns the value SS$_NORMAL.
  A transition requests that LIB$T[ABLE_]PARSE consider the string
  invalid by specifying a transition to TPA$_FAIL at main level (rather
  than at the level of a subexpression). LIB$T[ABLE_]PARSE returns with
  the value LIB$_SYNTAXERR. You can also request a transition to
  TPA$_FAIL from an action routine. The action routine can provide an
  alternate failure status.
An error occurs at the main level. The error can be:
  
    A syntax error. All transitions in the current state fail to match
    the remaining input string. LIB$T[ABLE_]PARSE returns LIB$_SYNTAXERR or
    an alternate failure status returned by an action routine.
    A state table format error. One of your state table entries is
    invalid. LIB$T[ABLE_]PARSE returns LIB$_INVTYPE.
   
 
  | Note LIB$T[ABLE_]PARSE generates no signals and establishes no condition
handler; action routines can signal through LIB$T[ABLE_]PARSE back to
the calling program.
 |  
When LIB$T[ABLE_]PARSE cannot successfully parse the entire string, it
defines the current token, as follows, and stores it in the argument
block before returning:
 
  If LIB$T[ABLE_]PARSE fails to match a transition in the current
  state, it attempts to define the current token as the beginning of the
  remaining input string. You can incorporate this token in an error
  message or use it to determine the logical flow of your program.
  LIB$T[ABLE_]PARSE attempts to match the characters from the
  beginning of the remaining input string, one at a time, against the
  TPA$_SYMBOL alphabet symbol type until it encounters a character that
  does not match. The TPA$_SYMBOL symbol type consists of all the
  characters of the standard OpenVMS symbol constituent set.
 
    If LIB$T[ABLE_]PARSE successfully matches one or more consecutive
    characters from the input string against TPA$_SYMBOL, then the
    substring that matched TPA$_SYMBOL becomes the current token.
    If the first character of the remaining input string does not match
    TPA$_SYMBOL, the first character becomes the current token.
  If LIB$T[ABLE_]PARSE matches the symbol type for a transition that
  specifies TPA$_FAIL as the next state, it leaves the token that matched
  the transition as the current token.
 
1.2 Alphabet of LIB$T[ABLE_]PARSE
The LIB$T[ABLE_]PARSE alphabet consists of a set of symbol types
defined in Table lib-9. This alphabet includes strings made up of
elements of the ASCII character set. It provides all the basic building
blocks needed for constructing a grammar using the ASCII character set.
The alphabet also includes symbol types that represent the more complex
constructions found in programming and command language grammar.
 
Use the symbols types that comprise the LIB$T[ABLE_]PARSE alphabet to
define a vocabulary and grammar for your language. For each transition
you define, you specify one of the alphabet symbol types.
LIB$T[ABLE_]PARSE compares the characters at the beginning of the
remaining input string with this symbol type of each of the possible
transitions. If LIB$T[ABLE_]PARSE finds a match, it enters the state
specified by that transition.  
 
  Table lib-9 The Alphabet of LIB$T [ABLE_]PARSE
  
    | Symbol Type | Characters Matched |  
    | '
      x' | The particular ASCII character. In a state table, it is expressed by
      enclosing the character in single quotation marks. The character can be
      any member of the 8-bit ASCII code set. LIB$T[ABLE_]PARSE does not
      consider uppercase and lowercase alphabetic characters and codes with
      different values in bit 7 to be equivalent. |  
    | TPA$_ANY | Any single character. |  
    | TPA$_ALPHA | Any alphabetic character, which includes the DEC multinational
      character set. |  
    | TPA$_DIGIT | Any numeric character, that is, 0 through 9. |  
    | TPA$_STRING | Any string of one or more alphanumeric characters, that is, uppercase
      or lowercase A through Z, and the numeric characters 0 through 9. The
      string can be any length. It is bounded on the right by the first
      nonalphanumeric character or by the end of the string. |  
    | TPA$_SYMBOL | Any string of one or more through characters of the standard OpenVMS
      symbol constituent set, that is, uppercase and lowercase A through Z
      and all DEC multinational characters, in addition to the dollar sign
      ($) and the underscore (_). The string is bounded on the right by some
      character not in the symbol constituent set (usually a blank) or by the
      end of the string. |  
    | '
      keyword' | The string of characters enclosed in single quotation marks. A keyword
      can consist of one or more characters of the OpenVMS symbol constituent
      set, that is, uppercase and lowercase A through Z, the numeric
      characters 0 through 9, the dollar sign ($), and the underscore (_).
      Uppercase and lowercase alphabetics are treated as different characters.  A state table can contain up to 220 keywords. The keyword is
      bounded on the right by a character not in the symbol constituent set
      or by the end of the string.
        Keywords that are one character in length are expressed in the form
      '
      x*' to distinguish them from the single-character symbol ('
      x'). They must be differentiated because they are not the same
      in operation. For example, in the input string AB+C, the single
      character 'A' would match the first character of this string, whereas
      the keyword 'A*' would not, because B in the string is in the symbol
      constituent set.
     |  
    | TPA$_BLANK | Any string of one or more blanks and/or tabs. |  
    | TPA$_OCTAL | Any octal number (that is, any string of one or more numeric characters
      0 through 7) whose magnitude is less than 2
      32 for a 32-bit argument block or less than 2
      64 for a 64-bit argument block. |  
    | TPA$_DECIMAL | Any decimal number (that is, any string of one or more numeric
      characters 0 through 9) whose magnitude is less than 2
      32 for a 32-bit argument block or less than 2
      64 for a 64-bit argument block. |  
    | TPA$_HEX | Any hexadecimal number (that is, any string of one or more numeric
      characters 0 through 9, A through F) whose magnitude is less than 2
      32 for a 32-bit argument block or less than 2
      64 for a 64-bit argument block. |  
    | (Alpha specific) TPA$_OCTAL_64 | Any octal number (that is, any string of one or more numeric characters
      0 through 7) whose magnitude is less than 2
      64. |  
    | (Alpha specific) TPA$_DECIMAL_64 | Any decimal number (that is, any string of one or more numeric
      characters 0 through 9) whose magnitude is less than 2
      64. |  
    | (Alpha specific) TPA$_HEX_64 | Any hexadecimal number (that is, any string of one or more numeric
      characters 0 through 9, A through F) whose magnitude is less than 2
      64. |  
    | TPA$_FILESPEC | Any string that constitutes a valid OpenVMS file specification. The
      string is bounded on the right by the first character that either is
      not a file specification constituent character or would cause the
      string to violate the syntax rules of a file specification. |  
    | TPA$_NODE | Matches a full node specification including the double colon (::). |  
    | TPA$_NODE_ACS | Matches a primary node specification including the access control
      string, if any, but not the double colon (::). |  
    | TPA$_NODE_PRIMARY | Matches a primary node specification excluding both the access control
      string, if any, and the double colon (::). |  
    | TPA$_UIC | Any string that constitutes a valid OpenVMS numerical UIC
      specification, bounded by square brackets or angle brackets. The binary
      value of the UIC, converted in octal radix, is placed in the argument
      block. The wildcard character (*) is permitted in the group and/or
      member fields; its presence results in that field being set to its
      largest possible value in the binary representation. |  
    | TPA$_IDENT | Any string that constitutes a valid OpenVMS identifier. Identifiers may
      be given as numerical UICs according to the rules for TPA$_UIC, or as
      alphabetic identifier names that appear in the system's rights
      database. The binary value of the identifier, converted in either octal
      or hexadecimal radix or by lookup in the system rights database, is
      placed in the argument block. Identifiers can be entered in any of the
      following forms:  [n,m] <n,m>
You can use a wildcard (*) in place of any occurence of
      number or
      name in an identifier form.[name1,name2] <name1,name2>
 [name] <name>
 name
 %Xhex-value
 
 |  
    | TPA$_LAMBDA | The empty string (always matches). As it executes the transition,
      LIB$T[ABLE_]PARSE does not remove any characters from the input string.
      LAMBDA transitions are useful in getting action routines called under
      otherwise awkward circumstances, providing unconditional GOTOs to link
      portions of a state table together, and providing default actions in
      certain cases. |  
    | TPA$_EOS | The end of the input string. |  
    | state label | The label of a state that functions as a subexpression. A subexpression
      is analogous to a subroutine within the state table.  The subexpression facility permits complex syntactic constructs
      that appear in many places in grammar to appear only once in the state
      table. It also permits a degree of nondeterministic or pushdown parsing
      with a parser that is otherwise deterministic and finite-state. See
      Section 3.5  for detailed information about subexpressions and
      examples of their use.
     |  
 
  | Note By default, LIB$T[ABLE_]PARSE treats blanks (defined to be either
spaces or tabs), as though they belong to no symbol type constituent
set. Effectively, this makes the blank a separator. LIB$T[ABLE_]PARSE
begins its next comparison with the first nonblank character following
the blanks. To have LIB$T[ABLE_]PARSE evaluate a blank as it would any
other character in the input string, set the TPA$V_BLANKS flag in the
argument block. Section 3.2  provides an example of the use of
this flag.
 |  
1.3 State Tables
This section describes state table generation and the macros used to
construct state tables. Section 2  explains how to use these
macros.
 
The state table must be set up using either MACRO or BLISS. Everything
else, including any action routines, can be coded in the language of
your choice. Simply compile the state table separately, then link it
with your program.
 
The body of the state table consists of one or more states, each of
which defines one or more transitions to the same or other states. The
order of the states and the order of the transitions for each state are
important:
 
  If a transition does not specify a target state, LIB$T[ABLE_]PARSE
  transitions to the next state after the current state in the state
  table.
  For a given state, LIB$T[ABLE_]PARSE evaluates the input string
  against the transitions in the order in which they are defined and
  executes the first transition it matches.
  
    If a state defines more than one transition with symbol types that
    match overlapping sets of tokens, the order of transition definitions
    within the state is significant. For example, the characters 123
    followed by a comma (,) could match TPA$_DECIMAL, TPA$_OCTAL,
    TPA$_STRING, or one of several other symbol types.
     It is best to order transitions in order of increasing generality
    of their symbol types. For example, the TPA$_SYMBOL symbol type matches
    all keyword strings. In general, LIB$T[ABLE_]PARSE never executes a
    keyword transition that follows a TPA$_SYMBOL transition. The symbol
    types, in order of increasing generality, are as follows:
    
'keyword'
      'x'
 TPA$_EOS
 TPA$_ALPHA
 TPA$_DIGIT
 TPA$_BLANK
 TPA$_OCTAL
 TPA$_OCTAL_64 (Alpha only)
 TPA$_DECIMAL
 TPA$_DECIMAL_64 (Alpha only)
 TPA$_HEX
 TPA$_HEX_64 (Alpha only)
 TPA$_STRING
 TPA$_SYMBOL
 TPA$_UIC
 TPA$_IDENT
 TPA$_NODE_PRIMARY
 TPA$_NODE_ACS
 TPA$_NODE
 TPA$_FILESPEC
 TPA$_ANY
 TPA$_LAMBDA
 
 
  | Note The list of symbol types does not include subexpression calls, because
the generality of these calls depends on the symbol types recognized
within the subexpression. If you use action routines to reject certain
transitions, you can change the order in which that symbol type is
placed in this order. In any case, LIB$T[ABLE_]PARSE executes the first
transition listed in a state that you permit to match the leftmost
portion of the remaining input string.
 |  
1.3.1 MACRO State Table Generation Macro Calls
The OpenVMS system MACRO library contains a set of assembler macros
that allow convenient and readable coding of a LIB$T[ABLE_]PARSE state
table. These macros generate symbol definitions and tables. They do not
produce any executable code or routine calls.
 
There are four MACRO state table generation macros:
 
  $INIT_STATE---Initializes the LIB$T[ABLE_]PARSE macros and declares
  the beginning of a state table (see Section 1.3.1.1  )
  $STATE---Defines a state (see Section 1.3.1.2 )
  $TRAN---Defines a state transition (see Section 1.3.1.3 )
  $END_STATE---Ends the state table (see Section 1.3.1.4 )
 
A state table begins with a call to $INIT_STATE and ends with a call to
$END_STATE. Within the state table, define each state by a call to
$STATE immediately followed by as many calls to $TRAN as you need to
define the transitions from that state.
 
1.3.1.1  $INIT_STATE---Initializes the  LIB$T[ABLE_]PARSE Macros
The $INIT_STATE macro declares the beginning of a state table. It
initializes the internals of the table generator macros and declares
the locations of the state table and the keyword table:
 
  The state table is the structure containing the definitions of the
  states and the transitions between them. LIB$T[ABLE_]PARSE builds the
  state table as it processes the $STATE and $TRAN macros you use to
  define the table.
  The keyword table contains the text of the keywords used in the
  state table. LIB$T[ABLE_]PARSE builds the keyword table as it processes
  the calls to $TRAN for each state.
 
Section 4  provides specific information on the allocation
and binary representations of the state table and the keyword table.
This information may be useful in debugging your program.
 
 
  
    | 
 
$INIT_STATE     state-table ,key-table
 |  
state-tableThe name assigned to the state table. LIB$T[ABLE_]PARSE equates this
label to the start of the first state in the state table.key-tableThe name assigned to the keyword table. LIB$T[ABLE_]PARSE equates this
label to the start of the keyword table. 
You must supply both the address of the state table and the address of
the keyword table in the call to LIB$T[ABLE_]PARSE to perform a parse.
The $INIT_STATE macro can appear more than once in a program. Each
occurrence defines a separate state table. No part of any state table
can refer to part of any other state table.
 
1.3.1.2 $STATE---Defines a State
The $STATE macro declares the beginning of a state.
 
 
labelAn optional label for the state. LIB$T[ABLE_]PARSE equates the label,
if present, to the starting address of the state. 
1.3.1.3 $TRAN---Defines a State Transition
The $TRAN macro defines a transition from the state in which it is
defined to some other (or to the same) state. The arguments of the
macro define, among other things, the symbol type that causes the
transition to be executed, the state to which to transfer, and the
action routine to call, if any. The transition defined by a $TRAN macro
belongs to the state defined by the last preceding $STATE macro.
 
 
  
    | 
 
$TRAN   type [,label] [,action] [,mask] [,msk-adr] [,argument]
 |  
typeThe symbol type, taken from the LIB$T[ABLE_]PARSE alphabet, that is
recognized by this transition. The transition is taken if the
characters from the beginning of the remaining input string match the
specified symbol type.
 
   |