HP OpenVMS Systems Documentation |
Compaq C Run-Time Library Utilities Reference Manual
1.2.3 Link LinesA link line has the following form:
An example is as follows:
In the OpenVMS implementation, Link is interpreted as a copy. Thus, the previous line copies the information from US/Eastern to EST5EDT. The LINK-FROM field should appear as the NAME field in some zone line. The LINK-TO field is used as an alternate name for that zone. Except for continuation lines, lines may appear in any order in the input.
Chapter 2
|
escape_char <char_symbol> |
comment_char <char_symbol> |
In the preceding formats, <char_symbol> is the
character's symbolic name as defined in the charmap file used to build
the locale's codeset. One or more blank characters (spaces or tabs)
must separate
escape_char
or
comment_char
from <char_symbol>.
2.1.2 Category Source Definitions
Each category source definition consists of the following:
For example:
LC_CTYPE <source for LC_CTYPE category> END LC_CTYPE |
The source for all of the categories is specified using keywords, strings, character literals, and character symbols. Each keyword identifies either a definition or a rule. The remainder of the statement containing the keyword contains the operands to the keyword. Operands are separated from the keyword by one or more blank characters (spaces or tabs). A statement may be continued on the next line by placing a backslash (\) as the last character before the new-line character that terminates the line. Lines containing the comment character (#) in the first column are treated as comment lines.
A symbolic name begins with the left angle-bracket character (<) and ends with the right angle-bracket character (>). The characters between the < and the > can be any characters from the Portable Character Set, except for the control and space characters. For example, <A-diaeresis> could be a symbolic name for a character. Any symbolic name referenced in the locale source file must be defined via the Portable Character Set or in the character set description (charmap) file for that locale.
A character literal is the character itself, or a decimal, hexadecimal, or octal constant. A decimal constant contains two or three decimal digits and has the following form, where n is any decimal digit:
\dnn or \dnnn |
A hexadecimal constant contains two hexadecimal digits and has the following form, where n is any hexadecimal digit:
\xnn |
An octal constant contains two or three octal digits and has the following form, where n is any octal digit:
\nn or \nnn |
The explicit definition of each category in a locale definition source
file is not required. When a category is undefined in a locale
definition source file, the LOCALE COMPILE command will not store any
data value for this category in the resulting locale file.
2.2 LC_COLLATE Category
The LC_COLLATE category defines the relative order between collation items. This category begins with the LC_COLLATE header and ends with the END LC_COLLATE trailer.
A collation item is the unit of comparison for collation. A collation item may be a character or a sequence of characters. Every collation item in the locale has a set of weights, which determine if the collation item collates before, equal to, or after the other collation items in the locale. Each collation item is assigned collation weights by the LOCALE COMPILE command when the locale definition source file is compiled. These collation weights are then used by applications programs that compare strings.
String comparison is performed by comparing the collation weights of each character in the string until either a difference is found or the strings are determined to be equal. This comparison may be performed several times if the locale defines multiple collation orders. For example, in the French locale, the strings are compared using a primary set of collation weights. If they are equal on the basis of this comparison, they are compared again using a secondary set of collation weights. A collation item has a set of collation weights associated with it that is equal to the number of collation sort rules defined for the locale.
Every character defined in the charmap file (or every character in the Portable Character Set if no charmap file is specified) is itself a collation item. Additional collation items can be defined using the collating-element statement (see the description that follows).
Table 2-1 lists the statement keywords recognized in the LC_COLLATE category.
Keyword | Description |
---|---|
copy | Specifies the name of an existing locale to be used as the definition of this category. If you specify a copy statement, you need not specify any other keywords in this category. |
collating-element | Specifies multicharacter collation items. |
collating-symbol | Specifies collation symbols for use in collation sequence statements. |
order_start | Specifies collation order statements that assign collation weights to collation items. |
The
collating-element
,
collating-symbol
, and
order_start
statements are further described in the following sections.
2.2.1 The collating-element Statement
The collating-element statement specifies multicharacter collation items.
Syntax:
collating-element <character_symbol> from <string> |
The character_symbol argument defines a collation item that is a string of one or more characters as a single collation item. The character_symbol cannot duplicate any symbolic name in the current charmap file or any other symbolic name defined in this collation definition.
The string argument specifies a string of two or more characters that define the character_symbol argument. The following are examples of the syntax for the collating-element statement:
collating-element <ch> from "<c><h>" collating-element <e-acute> from "<acute><e>" collating-element <11> from "<1><1>" |
A character_symbol argument defined by the
collating-element
statement is recognized only within the LC_COLLATE category.
2.2.2 The collating-symbol Statement
The collating-symbol statement specifies collation symbols for use in collation sequence statements.
Syntax:
collating-symbol <collating_symbol> |
The collating-symbol argument cannot duplicate any symbolic name in the current charmap file or any other symbolic name defined in this collation definition. The following are examples of collating-symbol statements:
collating-symbol <UPPER_CASE> collating-symbol <HIGH> |
An argument defined by the
collating-symbol
statement is recognized only within the LC_COLLATE category.
2.2.3 The order_start Statement
The order_start statement is followed by one or more collation order statements that assign collation weights to collation items and the order_end keyword. The order_start statement is a required statement.
Syntax:
order_start sort_rules;sort_rules;...;sort_rules collation_order_statements order_end |
The sort_rules directives have the following syntax:
keyword, keyword,...,keyword |
where keyword is FORWARD, BACKWARD, or POSITION.
The sort_rules directives are optional. If specified, they define the rules to apply during string comparison. The number of specified sort_rules directives defines the number of weights each collation item is assigned (that is, the directives define the number of collation orders in the locale). If no sort_rules directives are specified, one forward directive is assumed and comparisons are made on a character basis rather than a string basis.
If sort_rules directives are present, the first one applies when comparing strings that use the primary weight, the second when comparing strings that use the secondary weight, and so on. Each set of sort_rules directives is separated by a semicolon (;). A sort_rules directive consists of one or more keywords separated by commas. The following keywords are supported:
FORWARD --- Specifies that collation weight comparisons proceed from the beginning of a string to the end of the string.
BACKWARD --- Specifies that collation weight comparisons proceed from the end of a string to the beginning of the string.
POSITION --- Specifies that collation weight comparisons consider the relative position of nonignored elements in the string (that is, if strings compare as equal, the element with the shortest distance from the starting point of the comparison collates first).
The forward and backward keywords are mutually exclusive.
The following is an example of a sort_rules directive:
order_start forward;backward |
The following syntax rules apply to the collation order statements:
The optional operands for each collation item are used to define the primary, secondary, or subsequent weights for the collation item. The special symbol IGNORE is used to indicate a collation item that is to be ignored when strings are compared.
An ellipsis keyword appearing in place of a collating_element_list indicates the weights are to be assigned, for the characters in the identified range, in numerically increasing order from the weight for the character symbol on the left side of the preceding statement.
The use of the ellipsis keyword results in a locale that may collate differently when compiled with different character set description (charmap) source files.
The UNDEFINED special symbol includes all coded character set values not specified explicitly or with an ellipsis symbol. These characters are inserted in the character collation order at the point indicated by the UNDEFINED special symbol and are all assigned the same weight. If no UNDEFINED special symbol exists and the collation order does not specify all collation items from the coded character set, a warning is issued and all undefined characters are placed at the end of the character collation order.
The following is an example of a collation order statement section in the LC_COLLATE locale definition source file category:
order_start forward;backward UNDEFINED IGNORE;IGNORE <LOW> <space> <LOW>;<space> ... <LOW>;... <a> <a>;<a> <a-acute> <a>;<a-acute> <a-grave> <a>;<a-grave> <A> <a>;<A> <A-acute> <a>;<A-acute> <A-grave> <a>;<A-grave> <ch> <ch>;<ch> <Ch> <ch>;<Ch> <s> <s>;<s> <ss> <s><s>;<s><s> <eszet> <s><s>;<eszet><eszet> ... <HIGH>;... <HIGH> order_end |
This example is interpreted as follows:
The LC_CTYPE category defines character classification, case conversion, and other character attributes. This category begins with the LC_CTYPE header and ends with the END LC_CTYPE trailer.
All operands for LC_CTYPE category statements are defined as lists of characters. Each list consists of one or more characters or symbolic character names separated by semicolons. An ellipsis (...) can represent a series of characters; for example, <a>;...;<z> represents the characters in the range a through z.
Table 2-2 lists the statement keywords recognized in the LC_CTYPE category. In the keyword descriptions, the phrase "automatically included" means that an error does not occur if the referenced characters are included or omitted; the characters are provided if they are missing, and are accepted if they are present.
Keyword | Description |
---|---|
copy |
Specifies the name of an existing locale to be used as the definition
for this category.
If you specify a copy statement, you cannot specify any other keyword. |
upper |
Defines uppercase letter characters.
Do not specify any character defined by the cntrl , digit , punct , or space keyword. The uppercase letters A through Z are automatically included in this set. |
lower |
Defines lowercase letter characters.
Do not specify any character defined by the cntrl , digit , punct , or space keyword. The lowercase letters a through z are automatically included in this set. |
alpha |
Defines all letter characters.
Do not specify any character defined by the cntrl , digit , punct , or space keyword. Characters defined by the upper and lower keywords are automatically included in this character class. |
digit |
Defines numeric digit characters.
Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified. The digits 0 through 9 are automatically included in this set. |
space |
Defines white-space characters.
Do not specify any character defined by the upper , lower , alpha , digit , graph , or xdigit keyword. The space, form-feed, new-line, carriage-return, tab, and vertical tab characters are automatically included in this set. |
cntrl |
Defines control characters.
Do not specify any character defined by the upper , lower , alpha , digit , punct , graph , print , or xdigit keyword. |
punct |
Defines punctuation characters.
Do not specify the space character or any character defined by the upper , lower , alpha , digit , cntrl , or xdigit keywords. |
graph |
Defines printable characters, excluding the space character.
Do not specify any character defined by the cntrl keyword. The characters defined by the upper , lower , alpha , digit , xdigit , and punct keywords are automatically included in this character class. |
Defines printable characters, including the space character.
Do not specify any character defined by the cntrl keyword. The space character and characters defined by the upper , lower , alpha , digit , xdigit , and punct keywords are automatically included in this character class. |
|
xdigit |
Defines hexadecimal digit characters.
Only the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9 can be specified. Any character, however, can be specified for the hexadecimal values for 10 to 15. These alternate hexadecimal digits are not used by standard conversion routines when converting digit strings from hexadecimal to numeric quantities. The numbers 0 through 9 and the letters A through F and a through f are automatically included in this set. |
blank |
Defines blank characters.
The space and horizontal tab characters are included in this character class. Any characters defined by this statement are automatically included in the space class. |
toupper |
Defines the mapping of lowercase characters to uppercase characters.
Operands for this keyword consist of character pairs separated by commas. Each character pair is enclosed in parentheses () and separated from the next pair by a semicolon (;). The first character in each pair is considered a lowercase character; the second character is considered an uppercase character. Only characters defined by the lower and upper keywords can be specified. If toupper is not specified, a through z is mapped to A through Z by default. |
tolower |
Defines the mapping of uppercase characters to lowercase characters.
Operands for this keyword consist of character pairs separated by commas. Each character pair is enclosed in parentheses () and separated from the next pair by a semicolon (;). The first character in each pair is considered an uppercase character; the second character is considered a lowercase character. Only characters defined by the lower and upper keywords can be specified. If tolower is not specified, the mapping defaults to the reverse mapping of the toupper keyword, if specified. If the toupper and tolower keywords are both omitted, the mapping for each defaults to that of the C locale. |
Additional keywords can be provided to define new character classifications. For example:
charclass vowel vowel <a>;<e>;<i>;<o>;<u>;<y> |
The LC_CTYPE category does not support multicharacter elements (for example, the German Eszet character is traditionally classified as a lowercase letter). In proper capitalization of German text, the Eszet character is replaced by the two characters SS; there is no corresponding uppercase letter. This kind of conversion is outside the scope of the toupper and tolower keywords.
Previous | Next | Contents | Index |