HP OpenVMS Systems Documentation

Content starts here

Compaq C Run-Time Library Utilities Reference Manual


Previous Contents Index


Chapter 3
Character Set Description (Charmap) File

The character set description file, called the charmap file, defines character symbols as character encodings. This file is the source file for a coded character set, or codeset.

3.1 Portable Character Set

All supported codesets have the Portable Character Set (PCS) as a proper subset. The PCS consists of the character symbols (listed by their standardized symbolic names) and their hexadecimal encodings, as shown in Table 3-1.

Table 3-1 Portable Character Set
Symbol Name Hexadecimal Encoding
<NUL> \x00
<alert> \x07
<backspace> \x08
<tab> \x09
<newline> \x0A
<vertical-tab> \x0B
<form-feed> \x0C
<carriage-return> \x0D
<space> \x20
<exclamation-mark> \x21
<quotation-mark> \x22
<number-sign> \x23
<dollar-sign> \x24
<percent> \x25
<ampersand> \x26
<apostrophe> \x27
<left-parenthesis> \x28
<right-parenthesis> \x29
<asterisk> \x2A
<plus-sign> \x2B
<comma> \x2C
<hyphen> \x2D
<period> \x2E
<slash> \x2F
<zero> \x30
<one> \x31
<two> \x32
<three> \x33
<four> \x34
<five> \x35
<six> \x36
<seven> \x37
<eight> \x38
<nine> \x39
<colon> \x3A
<semi-colon> \x3B
<less-than> \x3C
<equal-sign> \x3D
<greater-than> \x3E
<question-mark> \x3F
<commercial-at> \x40
<A> \x41
<B> \x42
<C> \x43
<D> \x44
<E> \x45
<F> \x46
<G> \x47
<H> \x48
<I> \x49
<J> \x4A
<K> \x4B
<L> \x4C
<M> \x4D
<N> \x4E
<O> \x4F
<P> \x50
<Q> \x51
<R> \x52
<S> \x53
<T> \x54
<U> \x55
<V> \x56
<W> \x57
<X> \x58
<Y> \x59
<Z> \x5A
<left-bracket> \x5B
<backslash> \x5C
<right-bracket> \x5D
<circumflex> \x5E
<underscore> \x5F
<grave-accent> \x60
<a> \x61
<b> \x62
<c> \x63
<d> \x64
<e> \x65
<f> \x66
<g> \x67
<h> \x68
<i> \x69
<j> \x6A
<k> \x6B
<l> \x6C
<m> \x6D
<n> \x6E
<o> \x6F
<p> \x70
<q> \x71
<r> \x72
<s> \x73
<t> \x74
<u> \x75
<v> \x76
<w> \x77
<x> \x78
<y> \x79
<z> \x7A
<left-brace> \x7B
<vertical-line> \x7C
<right-brace> \x7D
<tilde> \x7E

3.2 Components of a Charmap File

A charmap file has the following components:

  • An optional special symbolic name declarations section
    Each declaration in this section consists of a special symbolic name, followed by one or more space or tab characters, and a value. The following list describes the special symbolic names that you can include in the declarations section:
    <code_set_name>
    Specifies the name of the codeset for which the charmap file is defined. This value determines the value returned by the nl_langinfo (CODESET) subroutine. If <code_set_name> is not declared, the name for the Portable Character Set is used.

    <mb_cur_max>
    Specifies the maximum number of bytes in a character for the codeset. Valid values are 1 to 4. The default value is 1.

    <mb_cur_min>
    Specifies the minimum number of bytes in a character for the codeset. Since all supported codesets have the Portable Character Set as a proper subset, this value must be 1.

    <escape_char>
    Specifies the escape character that indicates encodings in hexadecimal or octal notation. The default value is a backslash (\).

    <comment_char>
    Specifies the character used to indicate a comment within a charmap file. The default value is the number sign (#).
  • The CHARMAP section header
    This header marks the beginning of the section that associates character symbols with encodings.
  • Mapping statements for characters in the codeset
    Each statement specifies a symbolic name for a character and the associated encoding for that character. A mapping statement has the following format:


    <char_symbol> encoding
    

    A symbolic name begins with the left angle-bracket (<) character and ends with the right angle-bracket (>) character. For char_symbol (the name between < and >), you can use any characters from the Portable Character Set, except for control and space characters. You can use a > in char_symbol; if you do, precede all > characters except the last one with the escape character (as specified by the <escape_char> special symbolic name).
    An encoding is specified as one or more character constants, with the maximum number of character constants specified by the <mb_cur_max> special symbolic name. The encoding may be specified as decimal, octal, or hexadecimal constants with the following formats:
    • Decimal constant: \dnn or \dnnn, where n is any decimal digit
    • Octal constant: \nn or \nnn, where n is any octal digit
    • Hexadecimal constant: \xnn, where n is any hexadecimal digit

    The following are sample character symbol definitions:


    <A>        \d65        #decimal constant
    <B>        \x42        #hexadecimal constant
    <j10101>   \x81\xA1    #multiple hexadecimal constants
    

    You can also define a range of symbolic names and corresponding encoded values, where the nonnumeric prefix for each symbolic name is common, and the numeric portion of the second symbolic name is equal to or greater than the numeric portion of the first symbolic name. In this format, a symbolic name value consists of zero or more nonnumeric characters followed by an integer of one or more decimal digits. This format defines a series of symbolic names. For example, the string <j0101>...<j0104> is interpreted as the symbolic names <j0101>, <j0102>, <j0103>, and <j0104>, in that order.
    In statements defining ranges of symbolic names, the specified encoded value is the value for the first symbolic name in the range. Subsequent symbolic names have encoded values in increasing order. Consider the following sample statement:


    <j0101>...<j0104>        \d129\d254
    

    This sample statement is interpreted as follows:


    <j0101> \d129\d254
    <j0102> \d129\d255
    <j0103> \d130\d0
    <j0104> \d130\d1
    

    You cannot assign multiple encodings to one symbolic name, but you can create multiple names for one encoded value because some characters have several common names. For example, the . character is called a period in some parts of the world, and a full stop in others. You can specify both names in the charmap. For example:


    <period>        \x2e
    <full-stop>     \x2e
    

    Any comments must begin with the character specified by the <comment_char> special symbolic name. When an entire line is a comment, you must specify the <comment_char> in the first column of the line.
  • The END CHARMAP section trailer
    This trailer indicates the end of character map statements.

The following is a portion of a sample charmap file:


CHARMAP
<code_set_name>         "ISO8859-1"
<mb_cur_max>            1
<mb_cur_min>            1
<escape_char>           \
<comment_char>          #

<NUL>                   \x00
<SOH>                   \x01
<STX>                   \x02
<ETX>                   \x03
<EOT>                   \x04
<ENQ>                   \x05
<ACK>                   \x06
<alert>                 \x07
<backspace>             \x08
<tab>                   \x09
<newline>               \x0a
<vertical-tab>          \x0b
<form-feed>             \x0c
<carriage-return>       \x0d
END CHARMAP


Chapter 4
Command Reference

This section describes the following commands offered by the Compaq C Run-Time Library utilities:

  • GENCAT
  • ICONV COMPILE
  • ICONV CONVERT
  • LOCALE COMPILE
  • LOCALE LOAD
  • LOCALE UNLOAD
  • LOCALE SHOW CHARACTER_DEFINITIONS
  • LOCALE SHOW CURRENT
  • LOCALE SHOW PUBLIC
  • LOCALE SHOW VALUE
  • zic


GENCAT

Merges message text source files into a message catalog file.

Format

GENCAT msgfile[,...] catfile


Parameters

msgfile

Required.

Name of the message text source file. The default file type is .MSGX.

catfile

Required.

Name of the message catalog output file. If catfile already exists, a new version is created that includes the messages in the existing catalog. The file type must be .CAT.


Qualifiers

None.


Description

The GENCAT command creates new message catalogs from one or more input source files and an existing catalog file (if one exists). A message catalog is a binary file containing the messages for an application. This includes all messages that the application issues, such as error messages, screen displays, and prompts. Applications retrieve messages from a message catalog using the catopen , catgets , and catclose C Run-Time Library routines. See the Compaq C Run-Time Library Reference Manual for OpenVMS Systems for details of these routines.

A message text source file is a text file that you create to hold messages printed by your program. You can use any text editor to enter messages into the text source file. Messages can be grouped into sets, usually to represent functional subsets of your program. Each message has a numeric identifier, which must be unique within its set. The message text source file can also contain commands recognized by GENCAT for manipulating sets and individual messages.

You can specify any number of message text source files. The GENCAT command processes multiple source files one after the other in the sequence that you specify them. Each successive source file modifies the catalog.

If a message catalog with the name catfile exists, GENCAT creates a new version of the file that includes the contents of the older version and then modifies it. If the catalog does not exist, GENCAT creates the catalog with the name catfile.

The catfile can contain the following commands:

  • message_number text
    Inserts text as a message with the identifier message_number. Follow these guidelines:
    • Numbers must be ascending within each set. You can skip a number, but you cannot go back to add a missing number or replace an existing number during a GENCAT session.
    • If the message text is empty and a space or tab field separator is present, an empty string is stored in the message catalog.
    • If a message source line has a message number but neither a field separator nor message text, the existing message with that number (if any) is deleted from the catalog.
  • $delset set_number
    Deletes the set of messages indicated by set_number.
  • $quote character
    Sets the quote character to character. See the Examples section for more information.
  • $set set_number
    Indicates that all messages entered after this command are placed in the set indicated by set_number. You can change the set by entering another $set command. However, set numbers must be entered in ascending order; you cannot go back to a lower numbered set during the GENCAT session. If the command is not used, the default set number is 1.

Each initial keyword or number must be followed by white space. The GENCAT utility ignores any line that begins with a space, a tab, or a dollar sign ($) character followed by a space, a tab, or a newline character. Therefore, you can use these sequences to start comments in your catfile. Blank lines are also ignored. Finally, you can place comments on the same line after the $delset, $quote, or $set commands because GENCAT ignores anything that follows these commands.

A line beginning with a digit marks a message to be included in the catalog. You can specify any amount of white space between the message ID number and the message text; however, when the message text is not delimited by quotation marks, one space or tab character is recommended. When message text is not in quotation marks, GENCAT treats additional white space as part of the message. When message text is enclosed in quotation marks, GENCAT ignores all spaces or tabs between the message ID and the first quotation character.

Escape sequences such as those recognized by the C language can be used in text. The escape character (\), a backslash, can be used to insert special characters in the message text. See Table 4-1.

Table 4-1 GENCAT Command: Special Characters
Escape Sequence Character
\n New Line
\t Horizontal Tab
\v Vertical Tab
\b Backspace
\r Carriage Return
\f Form Feed
\\ Backslash Character (\). Use to continue message text on the following line.
\ ddd The single-byte character associated with the octal value ddd. You can specify one, two, or three octal digits. However, you must include leading zeros if the characters following the octal digits are also valid octal digits; for example, the octal value for $ (dollar sign) is 44. To insert $5.00 into a message, use \0445.00, not \445.00; otherwise the 5 is parsed as part of the octal value.

error

When GENCAT reports an error, no action is taken on any commands and an existing catalog is left unchanged.

Examples

#1

$set 10 Communication Error Messages

      

This example uses the $set command in a source file to assign a set number to a group of messages.

The message set number is 10. All messages after the $set command and up to the next $set command are assigned a message set number of 10. (Set numbers must be assigned in ascending order but they need not be contiguous.)

You can include a comment in the $set command.

#2

$delset 10 Communication Error Messages
      

This example uses the $delset command to remove from a catalog all messages belonging to the specified message set (10, in this case).

The $delset command must be placed in the proper set number order with respect to any $set commands in the same source file. You can include a comment in the $delset command.

#3

12 "file removed"
      

This example shows how to enter the message text and assign a message ID number to it. In this case, a message ID of 12 is assigned to the text that follows it.

Leave at least one space or tab character between the message ID number and the message text, but you can include more spaces or tabs if you prefer. If you do include more spaces or tabs, they are ignored when the message text is in quotation marks and they are considered part of the text when the message text is not in quotation marks.

Message numbers must be in ascending order within a single message set but they need not be contiguous.

All text following the message number and up to the end of the line is included as message text. If you place the escape character (\), a backslash, as the last character on the line, the message text continues on the following line. Consider the following example:


This is the text associated with \
message number 5.

The two lines in the example define the following single-line message:

This is the text associated with message number 5.

#4

$quote "   Use a double quote to delimit message text
$set 10            Message Facility - Quote command messages
1 "Use the $quote command to define a character \
\n for delimiting message text" \n
2 "You can include the \"quote\" character in a message \n \
by placing a \\ (backslash) in front of it" \n
3 You can include the "quote" character in a message \n \
by having another character as the first nonspace \
\n character after the message ID number \n
$quote
4 You can disable the quote mechanism by \n \
using the $quote command without \n a character \
after it \n

      

This example shows the effect of a quote character.

The $quote command defines the double quote (") as the quote character. The quote character must be the first nonspace character after the message number. Any text following the next occurrence of the quote character is ignored.

This example also shows two ways to include the quote character in the message text:

  • Place a backslash (\) in front of the quote character.
  • Use another character as the first nonspace character after the message number. This disables the quote character for that message only.

This example also shows the following:

  • A backslash (\) is still required to split a quoted message across lines.
  • To display a backslash (\) in a message, you must place another backslash (\) in front of it.

  • You can format your message with a new-line character by using \n.
  • If you use the $quote command with no character argument, you disable the quote mechanism.


Previous Next Contents Index