United States |
Previous | Contents | Index |
The wide-character I/O functions provide operations analogous to most of the byte I/O functions, except that the fundamental units internal to the wide-character functions are wide characters.
However, the external representation (in files) is a sequence of multibyte characters, not wide characters. For the wide-character formatted input and output functions:
Byte I/O functions cannot handle state-dependent encodings. Wide-character I/O functions can. They accomplish this by associating each wide-character stream with a conversion-state object of type mbstate_t .
The wide-character I/O functions are:
fgetwc fputwc fwscanf fwprintf ungetwc fgetws fputws wscanf wprintf getwc putwc vfwprintf getwchar putwchar vwprintf |
The byte I/O functions are:
fgetc fputc fscanf fprintf ungetc fgets fputs scanf printf fread getc putc vfprinf fwrite gets puts vprintf getchar putchar |
The wide-character input functions read multibyte characters from the stream and convert them to wide characters as if they were read by successive calls to the fgetwc function. Each conversion occurs as a call were made to the mbrtowc function with the conversion state described by the stream's own mbstate_t object.
The wide-character output functions convert wide characters to multibyte characters and write them to the stream as if they were written by successive calls to the fputwc function. Each conversion occurs as if a call were made to the wcrtomb function, with the conversion state described by the I/O stream's own mbstate_t object.
If a wide-character I/O function encounters an invalid multibyte
character, the function sets
errno
to the value EILSEQ.
2.4 Conversion Specifications
Several of the Standard I/O functions (including the Terminal I/O functions) use conversion specifications to specify data formats for I/O. These functions are the formatted-input and formatted-output functions. Consider the following example:
int x = 5.0; FILE *outfile; . . . fprintf(outfile, "The answer is %d.\n", x); |
The decimal value of the variable x replaces the conversion specification %d in the string to be written to the file associated with the identifier outfile.
Each conversion specification begins with a percent sign (%) and ends with a conversion specifier, which is a character that specifies the type of conversion to be performed. Optional characters can appear between the percent sign and the conversion specifier.
For the wide-character formatted I/O functions, the conversion specification is a string of wide characters. For the byte I/O equivalent functions, it is a string of bytes.
Sections 2.4.1 and 2.4.2 describe these optional
characters and conversion specifiers.
2.4.1 Converting Input Information
The format specification string for the input of information can include three kinds of items:
Each input pointer is an address expression indicating an object whose type matches that of a corresponding conversion specification. Conversion specifications form part of the format string. The indicated object is the target that receives the input value. There must be as many input pointers as there are conversion specifications, and the addressed objects must match the types of the conversion specifications.
A conversion specification consists of the following characters, in the order listed:
Table 2-2 shows the characters you can use between the percent sign (%) (or the sequence %n$), and the conversion specifier. These characters are optional but, if specified, must occur in the order shown in Table 2-2.
Character | Meaning |
---|---|
* | An assignment-suppressing character. |
field width |
A nonzero decimal integer that specifies the maximum field width.
For the wide-character input functions, the field width is measured in wide characters. For the byte input functions, the field width is measured in bytes,
unless the directive is one of the following:
In these cases, the field width is measured in multibyte character units. |
h, l, or L (or ll) |
Precede a conversion specifier of d, i, or n with an h if the
corresponding argument is a pointer to
short int
rather than a pointer to
int
; with an l (lowercase ell) if it is a pointer to
long int
; or, for
OpenVMS Alpha systems only, with an L or ll (two lowercase
ells) if it is a pointer to
__int64
.
Precede a conversion specifier of o, u, or x with an h if the corresponding argument is a pointer to unsigned short int rather than a pointer to unsigned int ; with an l if it is a pointer to unsigned long int ; or, for OpenVMS Alpha systems only, with an L or ll if it is a pointer to unsigned __int64 . Precede a conversion specifier of c, s, or [ with an l (lower ell) if the corresponding argument is a pointer to a wchar_t . Finally, precede a conversion specifier of e, f, or g with an l (lowercase ell) if the corresponding argument is a pointer to double rather than a pointer to float , or with an L if it is a pointer to long double . If an h, l, L, or ll appears with any other conversion specifier, the behavior is undefined. |
Table 2-3 describes the conversion specifiers for formatted input.
Specifier | Input Type1 | Description |
---|---|---|
d | Expects a decimal integer in the input whose format is the same as expected for the subject sequence of the strtol function with the value 10 for the base argument. The corresponding argument must be a pointer to int . | |
i | Expects an integer whose type is determined by the leading input characters. A leading 0 is equated to octal, a leading 0X or 0x is equated to hexadecimal, and all other forms are equated to decimal. The corresponding argument must be a pointer to int . | |
o | Expects an octal integer in the input (with or without a leading 0). The corresponding argument must be a pointer to int . | |
u | Expects a decimal integer in the input whose format is the same as expected for the subject sequence of the strtoul function with the value 10 for the base argument. | |
x | Expects a hexadecimal integer in the input (with or without a leading 0x). The corresponding argument must be a pointer to unsigned int . | |
c | Byte |
Expects a single byte in the input. The corresponding argument must be
a pointer to
char
.
If a field width precedes the c conversion specifier, the number of characters specified by the field width is read. In this case, the corresponding argument must be a pointer to an array of char . If the optional character l (lowercase ell) precedes this conversion specifier, then the specifier expects a multibyte character in the input which is converted into a wide-character code. The corresponding argument must be a pointer to type wchar_t . If a field width also precedes the c conversion specifier, the number of characters specified by the field width is read. In this case, the corresponding argument must be a pointer to an array of wchar_t . |
Wide-character |
Expects a sequence of the number of characters specified in the
optional field width; this is 1 if not specified.
If no l (lowercase ell) precedes the c specifier, then the corresponding argument must be a pointer to an array of char . If an l (lowercase ell) precedes the c specifier, the corresponding argument must be a pointer to an array of wchar_t . |
|
C | Byte |
The specifier expects a multibyte character in the input, which is
converted into a wide-character code. The corresponding argument must
be a pointer to type
wchar_t
.
If a field width also precedes the C conversion specifier, the number of characters specified by the field width is read. In this case, the corresponding argument must be a pointer to an array of wchar_t . |
Wide-character | Expects a sequence of the number of characters specified in the optional field width; this is 1 if not specified. The corresponding argument must be a pointer to an array of wchar_t . | |
s | Byte |
Expects a sequences of bytes in the input. The corresponding argument
must be a pointer to an array of characters that is large enough to
contain the sequence and a terminating null character (\0) that is
automatically added. The input field is terminated by a space, tab, or
new-line character.
If the optional character l (ell) precedes this conversion specifier, the specifier expects a sequence of multibyte characters in the input, which are converted to wide-character codes. The corresponding argument must be a pointer to an array of wide characters (type wchar_t ) that is large enough to contain the sequence plus the terminating null wide-character code that is automatically added. The input field is terminated by a space, tab, or new-line character. |
Wide-character |
Expects (conceptually) a sequence of nonwhite-space characters in the
input.
If no l (lowercase ell) precedes the s specifier, then the corresponding argument must be a pointer to an array of char large enough to contain the sequence plus the terminating null byte that is automatically added. If an l (lowercase ell) precedes the s specifier, then the corresponding argument must be a pointer to an array of wchar_t large enough to contain the sequence plus the terminating null wide character that is automatically added. |
|
S | Byte | The specifier expects a sequence of multibyte characters in the input, which are converted to wide-character codes. The corresponding argument must be a pointer to an array of wide characters (type wchar_t ) that is large enough to contain the sequence plus a terminating null wide-character code which is added automatically. The input field is terminated by a space, tab, or new-line character. |
Wide-character | Expects a sequence of nonwhite-space characters in the input. The corresponding argument must be a pointer to an array of wchar_t large enough to contain the sequence plus the terminating null wide character that is automatically added. | |
e, f, g | Expects a floating-point number in the input. The corresponding argument must be a pointer to float . The input format for floating-point numbers is: [<pm symbol>]nnn[radix][ddd][{E|e}[<pm symbol>]nn]. The n's and d's are decimal digits (as many as indicated by the field width minus the signs and the letter E). The radix character is defined in the current locale. | |
[...] |
Expects a nonempty sequence of characters that is not delimited by a
white-space character. The brackets enclose a set of characters (the
scanset) expected in the input sequence. Any character in the
input sequence that does not match a character in the scanset
terminates the character sequence.
All characters between the brackets comprise the scanset, unless the first character after the left bracket is a circumflex (^). In this case, the scanset contains all characters other than those that appear between the circumflex and the right bracket. Any character that does appear between the circumflex and the right bracket will terminate the input character sequence. If the conversion specifier begins with [] or [^], the right bracket character is in the scanset and the next right bracket character is the matching right bracket that ends the specification; otherwise, the first right bracket character ends the specification. |
|
Byte |
If an l (lowercase ell) does not precede the [ specifier, then the
characters in the scanset must be single-byte characters only. In this
case, the corresponding argument must be a pointer to an array of
char
large enough to accept the sequence and the terminating null byte which
is automatically added.
If an l (lowercase ell) does precede the [ specifier, the characters in the input sequence are considered to be multibyte characters, which are then converted to a wide-character sequence for further processing. If character ranges are specified in the scanset, then the processing is done according to the LC_COLLATE category of the current program's locale. In this case, the corresponding argument must be a pointer to an array of wchar_t large enough to accept the sequence and the terminating null wide character which is automatically added. |
|
Wide-character |
If no l (lowercase ell) precedes the [ conversion specifier, then
processing is the same as described for the Byte-input type of the %l[
specifier, except that the corresponding argument must be an array of
char
large enough to accept the multibyte sequence plus the terminating null
byte that is automatically added.
If an l (lowercase ell) precedes the [ conversion specifier, then processing is the same as the preceding paragraph except that the corresponding argument must be an array of wchar_t large enough to accept the wide-character sequence plus the terminating null wide character that is automatically added. |
|
p | Requires an argument that is a pointer to void . The input value is interpreted as a hexadecimal value. | |
n | No input is consumed. The corresponding argument is a pointer to an integer. The integer is assigned the number of characters read from the input stream so far by this call to the formatted input function. Execution of a %n directive does not increment the assignment count returned when the formatted input function completes execution. | |
% | Matches a single percent symbol. No conversion or assignment takes place. The complete conversion specification would be %%. |
scanf("%d", &n) |
scanf("%d", n) |
field = %x |
field = 5218 field=5218 field= 5218 field =5218 |
fiel d=5218 |
The format specification string for the output of information can contain:
A conversion specification consists of the following, in the order listed:
For examples of conversion specifications, see the sample programs in Section 2.6.
Table 2-4 shows the characters you can use between the percent sign (%) (or the sequence %n$) and the conversion specifier. These characters are optional, but if specified, they must occur in the order shown in Table 2-4.
Character | Meaning | ||||||||
---|---|---|---|---|---|---|---|---|---|
flags |
You can use the following flag characters, alone or in any combined
order, to modify the conversion specification:
|
||||||||
|
|||||||||
field width |
The minimum field width can be designated by a decimal integer
constant, or by an output source. To specify an output source, use an
asterisk (*) or the sequence *
n$, where
n refers to the
nth output source listed after the format specification.
The minimum field width is considered after the conversion is done according to the all other components of the format directive. This component affects padding the result of the conversion as follows: If the result of the conversion is wider than the minimum field, write it out. If the result of the conversion is narrower than the minimum width, pad it to make up the field width. Pad with spaces by default. Pad with zeros if the 0 flag is specified; this does not mean that the width is an octal number. Padding is on the left by default, and on the right if a minus sign is specified. For the wide-character output functions, the field width is measured in wide characters; for the byte output functions, it is measured in bytes. |
||||||||
. (period) | Separates the field width from the precision. | ||||||||
precision |
The precision defines any of the following:
If a precision appears with any other conversion specifier, the behavior is undefined. Precision can be designated by a decimal integer constant, or by an output source. To specify an output source, use an asterisk (*) or the sequence * n$, where n refers to the nth output source listed after the format specification. If only the period is specified, the precision is taken as 0. |
||||||||
h, l, or L (or ll) |
An h specifies that a following d, i, o, u, x, or X conversion
specifier applies to a
short int
or
unsigned short int
argument; an h can also specify that a following n conversion specifier
applies to a pointer to a
short int
argument.
An l (lowercase ell) specifies that a following d, i, o, u, x, or X conversion specifier applies to a long int or unsigned long int argument; an l can also specify that a following n conversion specifier applies to a pointer to a long int argument. On OpenVMS Alpha systems, an L or ll (two lowercase ells) specifies that a following d, i, o, u, x, or X conversion specifier applies to an __int64 or unsigned __int64 argument. (ALPHA ONLY) An L specifies that a following e, E, f, g, or G conversion specifier applies to a long double argument. An l specifies that a following c or s conversion specifier applies to a wchar_t argument. If an h, l, or L appears with any other conversion specifier, the behavior is undefined. On OpenVMS VAX and OpenVMS Alpha systems, Compaq C int values are equivalent to long values. |
Previous | Next | Contents | Index |
|