HP OpenVMS Systems

C Programming Language

Compaq C

Compaq C
Run-Time Library Reference Manual for OpenVMS Systems

Contents

Index

Chapter 10
Developing International Software

This chapter describes typical features of international software and the features provided with the Compaq C Run-Time Library (RTL) that enable you to design and implement international software.

See the Reference Section for more detailed information on the functions described in this chapter.

10.1 Internationalization Support

The Compaq C RTL has added capabilities to allow application developers to create international software. The Compaq C RTL obtains information about a language and a culture by reading this information from locale files.

10.1.1 Installation

If you are using these Compaq C RTL capabilities, you must install a separate kit to provide these files to your system.

The save set, VMSI18N0nn, is provided on the same media as the OpenVMS operating system.

To install this save set, follow the standard OpenVMS installation procedures using this save set name as the name of the kit. There are several categories of locales that you can select to install. You can select as many locales as you need by answering the following prompts:

   Do you want European and US support?
   Do you want Chinese support?
   Do you want Japanese support?
   Do you want Korean support?
   Do you want Thai support?
   Do you want the Unicode converters?

This kit also has an Installation Verification Procedure that Compaq recommends you run to verify the correct installation of the kit.

10.1.2 Unicode Support

In OpenVMS Version 7.2, the Compaq C Run-Time Library added the Universal Unicode locale, which is distributed with the OpenVMS system, not with the VMSI18N0nn kit. The name of the Unicode locale is:

   UTF8-20

Like those locales shipped with the VMSI18N0nn kit, the Unicode locale is located at the standard location referred to by the SYS$I18N_LOCALE logical name.

The UTF8-20 Unicode is based on Unicode standard Version V2.0. The Unicode locale uses UCS-4 as wide-character encoding and UTF-8 as multibyte character encoding.

Compaq C RTL also includes converters that perform conversions between Unicode and any other supported character sets. The expanded set of converters includes converters for UCS-2, UCS-4, and UTF-8 Unicode encoding. The Unicode converters can be used by the ICONV CONVERT utility and by the iconv family of functions in the Compaq C Run-Time Library.

In OpenVMS Version 7.2, the Compaq C Run-Time Library added Unicode character set converters for Microsoft Code Page 437.

10.2 Features of International Software

International software is software that can support multiple languages and cultures. An international program should be able to:

Display messages in the user's own language. This includes screen displays, error messages and prompts.
Handle culture-specific information such as:
- Date and time formatting
  The conventions for representing dates and times vary from one country to another. For example, in the U.S.A., the month is given first; in the U.K the day is specified first. Therefore, the date 12/5/1993 is interpreted as December 5, 1993 in the U.S.A., and as May 12,1993 in the U.K.
- Numeric formatting
  The character that represents the decimal point (the radix character) and the thousands separator character vary from one country to another. For example, in the U.K. the period (.) is used to represent the radix character, and the comma is used as a separator. However, in Germany, the comma is used as the radix character and the period is the separator character. Therefore, the number 2,345.67 in the U.K. is the same as 2.345,67 in Germany.
- Monetary formatting
  Currency values are represented by different symbols and can be formatted using a variety of separator characters, depending on the currency.
Handle different coded character sets (not just ASCII).
Handle a mixture of single and multibyte characters.
Provide multipass string comparisons.
String comparison functions such as strcmp compare strings by comparing the codepoint values of the characters in the strings. However, some languages require more complex comparisons to correctly sort strings.

To meet the above requirements, an application should not make any assumptions about the language, local customs or the coded character set used. All this localization data should be defined separately from the program, and only bound to it at run-time.

The rest of the chapter describes how you can create international software using Compaq C.

10.3 Developing International Software Using Compaq C

The Compaq C environment provides the following facilities for creating international software:

A method for separating localization data from a program.
Localization data is held in a database known as a locale. This stores all the language and culture information required by a program. See Section 10.4 for details of the structure of locales.
A program specifies what locales to use by calling the setlocale function. See Section 10.5 for more information.
A method of separating message text from the program source.
This is achieved using message catalogs that store all the messages for an application. The message catalog is linked to the application at run-time. This means that the messages can be translated into different languages and then the required language version is selected at run-time. See Section 10.6.
Compaq C RTL functions that are sensitive to localization data.
The Compaq C RTL includes functions for:
- Converting between different codesets. See Section 10.7.
- Handling culture-specific information. See Section 10.8.
- Multipass string collation. See Section 10.10.
A special wide-character data type defined in the Compaq C RTL makes it easier to handle codesets that have a mixture of single and multibyte characters. A set of functions is also defined to support this wide character data type. See Section 10.9.

10.4 Locales

A locale consists of different categories, each of which determines one aspect of the international environment. Table 10-1 lists the categories in a locale and describes the information in each.

**Table 10-1 Locale Categories**
Category	Description
LC_COLLATE	Contains information about collating sequences.
LC_CTYPE	Contains information about character classification.
LC_MESSAGES	Defines the answers that are expected in response to yes/no prompts.
LC_MONETARY	Contains monetary formatting information.
LC_NUMERIC	Contains information about formatting numbers.
LC_TIME	Contains time and date information.

The locales provided by Compaq reside in the directory defined by the SYS$I18N_LOCALE logical name. The file naming convention for locales is:

language_country_codeset.locale

Where:

language is the mnemonic for the language. For example, EN indicates an English locale.
country is the mnemonic for the country. For example, GB indicates a British locale.
codeset is the name of the ISO standard codeset for the locale. For example, ISO8859-1 is the ISO 8859 codeset for the Western European languages. See Section 10.7 for more information about the codesets supported.

10.5 Using the setlocale Function to Set Up an International Environment

An application sets up its international environment at run-time by calling the setlocale function. The international environment is set up in one of two ways:

The environment is defined by one locale. In this case, each of the locale categories is defined by the same locale.
Categories are defined separately. This lets you define a mixed environment that uses different locales depending on the operation performed. For example, if an English user has some Spanish files that are to be processed by an application, the LC_COLLATE category could be defined by a Spanish locale while the other categories are defined by an English locale. To do this you would call setlocale once for each category.

The syntax for the setlocale function is:

char *setlocale(int category, const char *locale)

Where:

category is either the name of a category, or LC_ALL. Specifying LC_ALL means that all the categories are defined by the same locale. Specify a category name to set up a mixed environment.
locale is one of the following:
- The name of the locale to use.
  If you want users to specify the locale interactively, your application could prompt the user for a locale name, and then pass the name as an argument to the setlocale function. A locale name has the following format:
  language_country.codeset[@modifier]
  For example, setlocale(LC_COLLATE, "en_US.ISO8859-1") selects the locale en_US.ISO8859-1 for the LC_COLLATE category.
- ""
  This causes the function to use logical names to determine the locale for the category specified. See Specifying the Locale Using Logical Names for details.

If an application does not call the setlocale function, the default locale is the C locale. This allows such applications to call those functions that use information in the current locale.

Specifying the Locale Using Logical Names

If the setlocale function is called with "" as the locale argument, the function checks for a number of logical names to determine the locale name for the category specified.

There are a number of logical names that users can set up to define their international environment:

Logical name corresponding to a category
For example, the LC_NUMERIC logical name defines the locale associated with the LC_NUMERIC category within the user's environment.
LC_ALL
LANG
The LANG logical name defines the user's language.

In addition to the logical names defined by a user, there are a number of system-wide logical names, set up during system startup, that define the default international environment for all users on a system:

SYS$category
Where category is the name of a category. This specifies the system default for that category.
SYS$LC_ALL
SYS$LANG

The setlocale function checks for user-defined logical names first, and if these are not defined, it checks the system logical names.

10.6 Using Message Catalogs

An important requirement for international software is that it should be able to communicate with the user in the user's own language. The messaging system enables program messages to be created separately from the program source, and linked to the program at run-time.

Messages are defined in a message text source file, and compiled into a message catalog using the GENCAT command. The message catalog is accessed by a program using the functions provided in the Compaq C RTL.

The functions provided to access the messages in a catalog are:

The catopen function, which opens a specified catalog ready for use.
The catgets function, which enables the program to read a specific message from a catalog.
The catclose function, which closes a specified catalog. Open message catalogs are also closed by the exit function.

For information on generating message catalogs, see the GENCAT command description in the OpenVMS system documentation.

10.7 Handling Different Character Sets

The Compaq C RTL supports a number of state-independent codesets and codeset encoding schemes that contain the ASCII encoded Portable Character Set. It does not support state-dependent codesets. The codesets supported are:

ISO8859-n
where n = 1,2,5,7,8 or 9. This covers codesets for North America, Europe (West and East), Israel, and Turkey.
eucJP, SJIS, DECKANJI, SDECKANJI: Codesets used in Japan.
eucTW, DECHANYU, BIG5, DECHANZI: Chinese codesets used in China (PRC), Hong-Kong, and Taiwan.
DECKOREAN: Codeset used in Korea.

10.7.1 Charmap File

The characters in a codeset are defined in a charmap file. The charmap files supplied by Compaq are located in the directory defined by the SYS$I18N_LOCALE logical name. The file type for a charmap file is .CMAP.

10.7.2 Converter Functions

As well as supporting different coded character sets, the Compaq C RTL provides the following converter functions that enable you to convert characters from one codeset to another:

iconv_open ---specifies the type of conversion. It allocates a conversion descriptor required by the iconv function.
iconv ---converts characters in a file to the equivalent characters in a different codeset. The converted characters are stored in a separate file.
iconv_close ---deallocates a conversion descriptor and the resources allocated to the descriptor.

10.7.3 Using Codeset Converter Files

The file naming convention for codeset converters is:

fromcode_tocode.iconv

Where fromcode is the name of the source codeset, and tocode is the name of the codeset to which characters are converted.

You can add codeset converters to a given system by installing the converter files in the directory pointed by the logical name SYS$I18N_ICONV.

Codeset converter files can be implemented either as table-based conversion files or as algorithm-based converter files created as OpenVMS shareable images.

Creating a Table-based Conversion File

The following summarizes the necessary steps to create a table-based codeset converter file:

Create a text file that describes the mapping between any character from the source codeset to the target codeset. For the format of this file, see the DCL command ICONV COMPILE in the OpenVMS New Features Manual, which processes such a file and creates a codeset converter table file.
Copy the resulting file from the previous step to the directory pointed by the logical SYS$I18N_ICONV, assuming you have the privilege to do so.

Creating an Algorithm-based Conversion File

Use the following steps to create an algorithm-based codeset converter file implemented as a shareable image:

Create C source files that implement the codeset converter. The API is documented in the public header file <iconv.h> as follows:
- The universal entry point _u_iconv_open is called by the Compaq C RTL routine iconv_open to initialize a conversion.
- _u_iconv_open returns to iconv_open a pointer to the structure __iconv_extern_obj_t .
- Within this structure, the converter exports its own conversion entry point and conversion close routine, which are called by the Compaq C RTL routines iconv and iconv_close , respectively.
- The major and minor identifier fields are required by iconv_open to test for a possible mismatch between the library and the converter. The converter usually assigns the constants __ICONV_MAJOR and __ICONV_MINOR, defined in the <iconv.h> header file.
- The field tcs_mb_cur_max is used only by the DCL command ICONV CONVERT to optimize its buffer usage. This field reflects the maximum number of bytes that comprise a single character in the target codeset, including the shift sequence (if any).
Compile and link the modules that comprise the codeset converter as an OpenVMS shareable image, making sure that the file name adheres to the preceding conventions.
Copy the resulting file from the previous step to the directory pointed by the logical SYS$I18N_ICONV, assuming you have the privilege to do so.

Some Final Notes

SYS$I18N_ICONV is by default a search list where the first directory in the list SYS$SYSROOT:[SYS$I18N.ICONV.USER] is meant for use as a site-specific repository for iconv codeset converters.

The number of codesets and locales installed vary from system to system. Check the SYS$I18N directory tree for the codesets, converters, and locales installed on your system.

10.8 Handling Culture-Specific Information

Each locale contains the following cultural information:

Date and time information
The LC_TIME category defines the conventions for writing date and time, the names of the days of the week, and the names of months of the year.
Numeric information
The LC_NUMERIC category defines the conventions for formatting non-monetary values.
Monetary information
The LC_MONETARY category defines currency symbols and the conventions used to format monetary values.
Yes and no responses
The LC_MESSAGES category defines the strings expected in response to yes/no questions.

You can extract some of this cultural information using the nl_langinfo function and the localeconv function. See Section 10.8.1.

10.8.1 Extracting Cultural Information From a Locale

The nl_langinfo function returns a pointer to a string that contains an item of information obtained from the program's current locale. The information you can extract from the locale is:

Date and time formats
The names of the days of the week, and months of the year in the local language
The radix character
The character used to separate groups of digits in non-monetary values
The currency symbol
The name of the codeset for the locale
The strings defined for responses to yes/no questions

The localeconv function returns a pointer to a data structure that contains numeric formatting and monetary formatting data from the LC_NUMERIC and LC_MONETARY categories.

10.8.2 Date and Time Formatting Functions

The functions that use the date and time information are:

strftime ---takes date and time values stored in a data structure and formats them into an output string. The format of the output string is controlled by a format string.
strptime ---converts a string (of type char ) into date and time values. A format string defines how the string is interpreted.
wcsftime ---does the same as strftime except that it creates a wide-character string.

10.8.3 Monetary Formatting Function

The strfmon function uses the monetary information in a locale to convert a number of values into a string. The format of the string is controlled by a format string.

10.8.4 Numeric Formatting

The information in LC_NUMERIC is used by various functions. For example, strtod , wcstod , and the print and scan functions determine the radix character from the LC_NUMERIC category.

10.9 Functions for Handling Wide Characters

A character can be represented by single-byte or multibyte values depending on the codeset. To make it easier to handle both single-byte and multibyte characters in the same way, the Compaq C RTL defines a wide-character data type, wchar_t. This data type can store characters that are represented by 1-, 2-, 3-, or 4-byte values.

The functions provided to support wide characters are:

Character classification functions. See Section 10.9.1.
Case conversion functions. See Section 10.9.2.
Input and output functions. See Section 10.9.3.
Multibyte to wide-character conversion functions. See Section 10.9.4.
Wide-character to multibyte conversion functions. See Section 10.9.4.
Wide-character string manipulation functions. See Section 10.9.5.
Wide-character string collation and comparison functions. See Section 10.10.