Compaq C
Compaq C
Run-Time Library Reference Manual for
OpenVMS Systems
Chapter 10
Developing International Software
This chapter describes typical features of international software and
the features provided with the Compaq C Run-Time Library  (RTL) that enable you to design
and implement international software.
See the Reference Section for more detailed information on the
functions described in this chapter.
10.1 Internationalization Support
The Compaq C RTL has added capabilities to allow application
developers to create international software. The Compaq C RTL
obtains information about a language and a culture by reading this
information from locale files.
10.1.1 Installation
If you are using these Compaq C RTL capabilities, you must install
a separate kit to provide these files to your system.
The save set, VMSI18N0nn, is provided on the same media as the
OpenVMS operating system.
To install this save set, follow the standard OpenVMS
installation procedures using this save set name as the name of the
kit. There are several categories of locales that you can select to
install. You can select as many locales as you need by answering the
following prompts:
  
    
       
      
   Do you want European and US support?
   Do you want Chinese support?
   Do you want Japanese support?
   Do you want Korean support?
   Do you want Thai support?
   Do you want the Unicode converters?
 
 | 
This kit also has an Installation Verification Procedure that Compaq
recommends you run to verify the correct installation of the kit.
10.1.2 Unicode Support
In OpenVMS Version 7.2, the Compaq C Run-Time Library
added the Universal Unicode locale, which is distributed with the
OpenVMS system, not with the VMSI18N0nn kit. The name of the
Unicode locale is:
Like those locales shipped with the VMSI18N0nn kit, the Unicode locale
is located at the standard location referred to by the SYS$I18N_LOCALE
logical name.
The UTF8-20 Unicode is based on Unicode standard Version V2.0. The
Unicode locale uses UCS-4 as wide-character encoding and UTF-8 as
multibyte character encoding.
Compaq C RTL also includes converters that perform conversions
between Unicode and any other supported character sets. The expanded
set of converters includes converters for UCS-2, UCS-4, and UTF-8
Unicode encoding. The Unicode converters can be used by the ICONV
CONVERT utility and by the
iconv
 family of functions in the Compaq C Run-Time Library.
In OpenVMS Version 7.2, the Compaq C Run-Time Library
added Unicode character set converters for Microsoft Code Page 437.
10.2 Features of International Software
International software is software that can support multiple languages
and cultures. An international program should be able to:
  -  Display messages in the user's own language. This includes screen
  displays, error messages and prompts.
  
 -  Handle culture-specific information such as:
  
    -  Date and time formatting 
The conventions for representing
    dates and times vary from one country to another. For example, in the
    U.S.A., the month is given first; in the U.K the day is specified
    first. Therefore, the date 12/5/1993 is interpreted as December 5, 1993
    in the U.S.A., and as May 12,1993 in the U.K.
     -  Numeric formatting 
The character that represents the decimal
    point (the radix character) and the thousands separator character vary
    from one country to another. For example, in the U.K. the period (.) is
    used to represent the radix character, and the comma is used as a
    separator. However, in Germany, the comma is used as the radix
    character and the period is the separator character. Therefore, the
    number 2,345.67 in the U.K. is the same as 2.345,67 in Germany.
     -  Monetary formatting 
Currency values are represented by
    different symbols and can be formatted using a variety of separator
    characters, depending on the currency.
   
   -  Handle different coded character sets (not just ASCII).
  
 -  Handle a mixture of single and multibyte characters.
  
 -  Provide multipass string comparisons. 
String comparison
  functions such as
strcmp
 compare strings by comparing the codepoint values of the characters in
 the strings. However, some languages require more complex comparisons
 to correctly sort strings.
 
To meet the above requirements, an application should not make any
assumptions about the language, local customs or the coded character
set used. All this localization data should be defined separately from
the program, and only bound to it at run-time.
The rest of the chapter describes how you can create international
software using Compaq C.
10.3 Developing International Software Using Compaq C
The Compaq C environment provides the following facilities for
creating international software:
  - A method for separating localization data from a program.
  
Localization data is held in a database known as a locale.
  This stores all the language and culture information required by a
  program. See Section 10.4 for details of the structure of locales.
    
A program specifies what locales to use by calling the
setlocale
 function. See Section 10.5 for more information.
   - A method of separating message text from the program source.
  
This is achieved using message catalogs that store all the
  messages for an application. The message catalog is linked to the
  application at run-time. This means that the messages can be translated
  into different languages and then the required language version is
  selected at run-time. See Section 10.6.
   - Compaq C RTL functions that are sensitive to localization data.
  
The Compaq C RTL includes functions for:
  
   - A special wide-character data type defined in the Compaq C RTL
  makes it easier to handle codesets that have a mixture of single and
  multibyte characters. A set of functions is also defined to support
  this wide character data type. See Section 10.9.
 
10.4 Locales
A locale consists of different categories, each of which determines one
aspect of the international environment. Table 10-1 lists the
categories in a locale and describes the information in each.
  Table 10-1 Locale Categories
  
    | Category  | 
    Description  | 
  
  
    | 
      LC_COLLATE
     | 
    
      Contains information about collating sequences.
     | 
  
  
    | 
      LC_CTYPE
     | 
    
      Contains information about character classification.
     | 
  
  
    | 
      LC_MESSAGES
     | 
    
      Defines the answers that are expected in response to yes/no prompts.
     | 
  
  
    | 
      LC_MONETARY
     | 
    
      Contains monetary formatting information.
     | 
  
  
    | 
      LC_NUMERIC
     | 
    
      Contains information about formatting numbers.
     | 
  
  
    | 
      LC_TIME
     | 
    
      Contains time and date information.
     | 
  
The locales provided by Compaq reside in the directory defined by the
SYS$I18N_LOCALE logical name. The file naming convention for locales is:
  
    
       
      
language_country_codeset.locale
 
 | 
Where:
  - language is the mnemonic for the language. For example, EN
  indicates an English locale.
  
 - country is the mnemonic for the country. For example, GB
  indicates a British locale.
  
 - codeset is the name of the ISO standard codeset for the
  locale. For example, ISO8859-1 is the ISO 8859 codeset for the Western
  European languages. See Section 10.7 for more information about the
  codesets supported.
 
10.5 Using the setlocale Function to Set Up an International Environment
An application sets up its international environment at run-time by
calling the
setlocale
 function. The international environment is set up in one of two ways:
  - The environment is defined by one locale. In this case, each of the
  locale categories is defined by the same locale.
  
 - Categories are defined separately. This lets you define a mixed
  environment that uses different locales depending on the operation
  performed. For example, if an English user has some Spanish files that
  are to be processed by an application, the LC_COLLATE category could be
  defined by a Spanish locale while the other categories are defined by
  an English locale. To do this you would call
setlocale
 once for each category.
 
The syntax for the
setlocale
 function is:
  
    
       
      
char *setlocale(int category, const char *locale)
     | 
  
Where:
  - category is either the name of a category, or LC_ALL.
  Specifying LC_ALL means that all the categories are defined by the same
  locale. Specify a category name to set up a mixed environment.
  
 - locale is one of the following:
  
    - The name of the locale to use. 
If you want users to specify the
    locale interactively, your application could prompt the user for a
    locale name, and then pass the name as an argument to the
setlocale
 function. A locale name has the following format:
  
    
       
      
   language_country.codeset[@modifier]
 
 | 
      
For example,
setlocale(LC_COLLATE,    "en_US.ISO8859-1")
 selects the locale en_US.ISO8859-1 for the LC_COLLATE category.
     - "" 
This causes the function to use logical names to determine
    the locale for the category specified. See Specifying the Locale Using Logical Names for details.
   
 
If an application does not call the
setlocale
 function, the default locale is the C locale. This allows such
 applications to call those functions that use information in the
 current locale.
Specifying the Locale Using Logical Names
If the
setlocale
 function is called with "" as the locale argument, the
 function checks for a number of logical names to determine the locale
 name for the category specified.
There are a number of logical names that users can set up to define
their international environment:
  - Logical name corresponding to a category 
For example, the
  LC_NUMERIC logical name defines the locale associated with the
  LC_NUMERIC category within the user's environment.
   - LC_ALL
  
 - LANG 
The LANG logical name defines the user's language.
 
In addition to the logical names defined by a user, there are a number
of system-wide logical names, set up during system startup, that define
the default international environment for all users on a system:
  - SYS$category 
Where category is the name of a
  category. This specifies the system default for that category.
   -  SYS$LC_ALL
  
 - SYS$LANG
 
The
setlocale
 function checks for user-defined logical names first, and if these are
 not defined, it checks the system logical names.
10.6 Using Message Catalogs
An important requirement for international software is that it should
be able to communicate with the user in the user's own language. The
messaging system enables program messages to be created separately from
the program source, and linked to the program at run-time.
Messages are defined in a message text source file, and compiled into a
message catalog using the GENCAT command. The message catalog is
accessed by a program using the functions provided in the Compaq C RTL.
The functions provided to access the messages in a catalog are:
  - The
catopen
 function, which opens a specified catalog ready for use.
  
 - The
catgets
 function, which enables the program to read a specific message from a
 catalog.
  
 - The
catclose
 function, which closes a specified catalog. Open message catalogs are
 also closed by the
exit
 function.
 
For information on generating message catalogs, see the GENCAT command
description in the OpenVMS system documentation.
10.7 Handling Different Character Sets
The Compaq C RTL supports a number of state-independent codesets and
codeset encoding schemes that contain the ASCII encoded Portable
Character Set. It does not support state-dependent codesets. The
codesets supported are:
  - ISO8859-n 
where n = 1,2,5,7,8 or 9. This
  covers codesets for North America, Europe (West and East), Israel, and
  Turkey.
   - eucJP, SJIS, DECKANJI, SDECKANJI: Codesets used in Japan.
  
 - eucTW, DECHANYU, BIG5, DECHANZI: Chinese codesets used in China
  (PRC), Hong-Kong, and Taiwan.
  
 - DECKOREAN: Codeset used in Korea.
 
10.7.1 Charmap File
The characters in a codeset are defined in a charmap file. The charmap
files supplied by Compaq are located in the directory defined by the
SYS$I18N_LOCALE logical name. The file type for a charmap file is .CMAP.
10.7.2 Converter Functions
As well as supporting different coded character sets, the Compaq C RTL
provides the following converter functions that enable you to convert
characters from one codeset to another:
  - 
iconv_open
---specifies the type of conversion. It allocates a conversion
descriptor required by the
iconv
 function.
  
 - 
iconv
---converts characters in a file to the equivalent characters in a
different codeset. The converted characters are stored in a separate
file.
  
 - 
iconv_close
---deallocates a conversion descriptor and the resources allocated to
the descriptor.
 
10.7.3 Using Codeset Converter Files
The file naming convention for codeset converters is:
Where fromcode is the name of the source codeset, and
tocode is the name of the codeset to which characters are
converted.
You can add codeset converters to a given system by installing the
converter files in the directory pointed by the logical name
SYS$I18N_ICONV.
Codeset converter files can be implemented either as table-based
conversion files or as algorithm-based converter files created as
OpenVMS shareable images.
Creating a Table-based Conversion File
The following summarizes the necessary steps to create a table-based
codeset converter file:
  - Create a text file that describes the mapping between any character
  from the source codeset to the target codeset. For the format of this
  file, see the DCL command ICONV COMPILE in the OpenVMS New
  Features Manual, which processes such a file and creates a codeset
  converter table file.
  
 - Copy the resulting file from the previous step to the directory
  pointed by the logical SYS$I18N_ICONV, assuming you have the privilege
  to do so.
 
Creating an Algorithm-based Conversion File
Use the following steps to create an algorithm-based codeset converter
file implemented as a shareable image:
  - Create C source files that implement the codeset converter. The API
  is documented in the public header file
<iconv.h>
as follows:
  
    - The universal entry point
_u_iconv_open
 is called by the Compaq C RTL routine
iconv_open
 to initialize a conversion.
    
 - 
_u_iconv_open
 returns to
iconv_open
 a pointer to the structure
__iconv_extern_obj_t
.
    
 - Within this structure, the converter exports its own conversion
    entry point and conversion close routine, which are called by the
    Compaq C RTL routines
iconv
 and
iconv_close
, respectively.
    
 - The major and minor identifier fields are required by
iconv_open
 to test for a possible mismatch between the library and the converter.
 The converter usually assigns the constants __ICONV_MAJOR and
 __ICONV_MINOR, defined in the
<iconv.h>
 header file.
    
 - The field tcs_mb_cur_max is used only by the DCL command
    ICONV CONVERT to optimize its buffer usage. This field reflects the
    maximum number of bytes that comprise a single character in the target
    codeset, including the shift sequence (if any).
  
 
   -  Compile and link the modules that comprise the codeset converter
  as an OpenVMS shareable image, making sure that the file name
  adheres to the preceding conventions.
  
 -  Copy the resulting file from the previous step to the directory
  pointed by the logical SYS$I18N_ICONV, assuming you have the privilege
  to do so.
 
Some Final Notes
SYS$I18N_ICONV is by default a search list where the first directory in
the list SYS$SYSROOT:[SYS$I18N.ICONV.USER] is meant for use as a
site-specific repository for
iconv
 codeset converters.
The number of codesets and locales installed vary from system to
system. Check the SYS$I18N directory tree for the codesets, converters,
and locales installed on your system.
10.8 Handling Culture-Specific Information
Each locale contains the following cultural information:
  -  Date and time information 
The LC_TIME category defines the
  conventions for writing date and time, the names of the days of the
  week, and the names of months of the year.
   -  Numeric information 
The LC_NUMERIC category defines the
  conventions for formatting non-monetary values.
   -  Monetary information 
The LC_MONETARY category defines currency
  symbols and the conventions used to format monetary values.
   - Yes and no responses 
The LC_MESSAGES category defines the
  strings expected in response to yes/no questions.
 
You can extract some of this cultural information using the
nl_langinfo
 function and the
localeconv
function. See Section 10.8.1.
10.8.1 Extracting Cultural Information From a Locale
The
nl_langinfo
 function returns a pointer to a string that contains an item of
 information obtained from the program's current locale. The information
 you can extract from the locale is:
  - Date and time formats
  
 - The names of the days of the week, and months of the year in the
  local language
  
 - The radix character
  
 - The character used to separate groups of digits in non-monetary
  values
  
 - The currency symbol
  
 - The name of the codeset for the locale
  
 - The strings defined for responses to yes/no questions
 
The
localeconv
 function returns a pointer to a data structure that contains numeric
 formatting and monetary formatting data from the LC_NUMERIC and
 LC_MONETARY categories.
10.8.2 Date and Time Formatting Functions
The functions that use the date and time information are:
  - 
strftime
---takes date and time values stored in a data structure and formats
them into an output string. The format of the output string is
controlled by a format string.
  
 - 
strptime
---converts a string (of type
char
) into date and time values. A format string defines how the string is
interpreted.
  
 - 
wcsftime
---does the same as
strftime
except that it creates a wide-character string.
 
10.8.3 Monetary Formatting Function
The
strfmon
 function uses the monetary information in a locale to convert a number
 of values into a string. The format of the string is controlled by a
 format string.
10.8.4 Numeric Formatting
The information in LC_NUMERIC is used by various functions. For example,
strtod
,
wcstod
, and the print and scan functions determine the radix character from
the LC_NUMERIC category.
10.9 Functions for Handling Wide Characters
A character can be represented by single-byte or multibyte values
depending on the codeset. To make it easier to handle both single-byte
and multibyte characters in the same way, the Compaq C RTL defines a
wide-character data type, wchar_t. This data type can store
characters that are represented by 1-, 2-, 3-, or 4-byte values.
The functions provided to support wide characters are: