Previous | Contents |
HSYSHR provides the following features and capabilities:
1.3 Linking with HSYSHR
Routines in the HSYSHR execute entirely in the mode of the caller and
are intended to be called in the user mode. To link your application
that contains explicit calls to HSYSHR, use the following link command:
$ LINK program, SYS$LIBRARY:HSYIMGLIB.OLB/LIBRARY
This chapter describes some important concepts of multi-byte character that are used throughout the documentation.
2.1 What is Multi-byte Character?
DEC Hanyu character set is implemented as a multi-byte character set
containing Chinese characters, punctuation marks and various kinds of
symbols. Each multi-byte character can either be a two-byte character
or four-byte character. In OpenVMS/Hanyu, the DEC Hanyu character set
is adopted, and Chinese characters are represented as multi-byte
characters from the character set. For detailed discussion of the DEC
Hanyu character set, please refer to OpenVMS/Hanyu User Guide.
2.2 Proper Character Boundary
In HSYSHR, most of the routines use characters as a processing entity
contrary to conventional byte by byte processing. Some routines require
the input character pointer pointing at the proper character boundary
in the user buffer. "Pointing at the proper character boundary" means
the character pointer should not point to the non-first-byte position
of a multi-byte character.
2.3 Full Form and Half Form Character
In DEC Hanyu character set, there is a set of two-byte ASCII
characters. To distinguish them from the conventional one-byte 7-bit
ASCII characters, the terms "full form" and "half form" characters are
used. Full form characters refer to two-byte ASCII characters whereas
half form characters refer to one-byte 7-bit ASCII characters.
Conversion services between full form and half form characters are
provided by the conversion routines in HSYSHR. In some applications
where character matching requires treating the full form and half form
characters equivalent, the user can call the searching routines in
HSYSHR and specify the conversion flag argument. Note that uppercasing
and lowercasing can both be applied to these full form characters.
2.4 Multi-byte Character Unsigned Longword Representation
In HSYSHR, multi-byte character representation in single character
argument is different from that found in the character string argument.
Single character argument uses unsigned longword integer representation
whereas characters in the string argument use the normal character
string representation. The following are two examples.
+--+--+--+--+ |00|00|B0|A1| +--+--+--+--+ H L |
--+--+--+- +--+ .... |A1|B0|....| | start of string --+--+--+- +--+ H L |
+--+--+--+--+ |C2|CB|B0|A2| +--+--+--+--+ H L |
--+--+--+--+--+- +--+ .... |A2|B0|CB|C2|....| | start of string --+--+--+--+--+- +--+ H L |
The read routines in HSYSHR read the buffer with character string format and return the character read in unsigned longword format. The write routines write the character in unsigned longword format to the buffer. The character written will be in character string format.
HSY$CH_MOVE moves a substring from a specified source buffer to a specified destination buffer.
HSY$CH_MOVE len,src,dst
len
VMS usage: longword_signed type: longword integer (signed) access: read only mechanism: by value
The length in bytes of the substring to be moved.src
VMS usage: longword_unsigned type: longword integer (unsigned) access: read only mechanism: by value
The address of the starting position of the source buffer.dst
VMS usage: longword_unsigned type: longword integer (unsigned) access: read only mechanism: by value
The address of the starting position of the destination buffer.
This routine is multi-byte insensitive. If len is not specifying the proper multi-byte character boundary, e.g. it indicates the second byte of a two-byte character, then only half of the multi-byte character is moved to the last character of the destination string.
HSY$DX_TRIM trims trailing one-byte and multi-byte spaces and TAB characters.
HSY$DX_TRIM dst,src,[len]
VMS usage: cond_value type: longword (unsigned) access: write only mechanism: by value
dst
VMS usage: char_string type: character string access: write only mechanism: by descriptor
The destination string to store the trimmed string.src
VMS usage: char_string type: character string access: read only mechanism: by descriptor
The source string that is to be converted.len
VMS usage: word_signed type: word integer (signed) access: write only mechanism: by reference
The length in bytes of the trimmed string. If this optional argument is not supplied, no length information of the trimmed string will be returned to the caller.
dst and src can contain one-byte and multi-byte characters.
LIB$_INVSTRDES Invalid string descriptor. A string descriptor has an invalid value in its DSC$B_CLASS field. LIB$_STRTRU Procedure successfully completed. String truncated. LIB$_FATERRLIB Fatal internal error. An internal consistency check has failed. LIB$_INSVIRMEM Insufficient virtual memory. SS$_NORMAL Procedure successfully completed.
HSY$DX_TRUNC truncates the input string to the specified length.
HSY$DX_TRUNC dst,src,offset,[len]
VMS usage: cond_value type: longword (unsigned) access: write only mechanism: by value
dst
VMS usage: char_string type: character string access: write only mechanism: by descriptor
The specified destination string to store the truncated string.src
VMS usage: char_string type: character string access: read only mechanism: by descriptor
The specified source string to be truncated.offset
VMS usage: word_signed type: word integer (signed) access: read only mechanism: by reference
The offset in bytes from the starting position of the source string which indicates the position of the first character just after the truncated string. Note that this offset may not be on the proper character boundary, e.g. it may point to the second byte of a two-byte character.len
VMS usage: word_signed type: word integer (signed) access: write only mechanism: by reference
The length in bytes of the truncated string. If this optional argument is not supplied, no length information of the truncated string will be returned to the caller.
The value returned in len may not necessarily be equal to the value specified in offset since offset may not be pointing at the first byte of a multi-byte character. In any case, the character indicated by offset will be treated as the first character that follows the truncated string.
LIB$_INVSTRDES Invalid string descriptor. A string descriptor has an invalid value in its DSC$B_CLASS field. LIB$_STRTRU Procedure successfully completed. Truncated string is further truncated due to insufficient space allocated in the destination string buffer. LIB$_FATERRLIB Fatal internal error. An internal consistency check has failed. LIB$_INSVIRMEM Insufficient virtual memory. SS$_NORMAL Procedure successfully completed.
HSY$TRIM trims trailing one-byte and multi-byte spaces and TAB characters.
HSY$TRIM str,len
VMS usage: longword_signed type: longword integer (signed) access: write only mechanism: by value The offset in bytes from the starting position of the input string which indicates the position of the terminating character of the trimmed string. If the terminating character is a multi-byte character, the returned offset will be pointing to the first byte of the multi-byte character.
str
VMS usage: longword_unsigned type: longword integer (unsigned) access: read only mechanism: by value
The address of the starting position of the input string to be trimmed.len
VMS usage: longword_signed type: longword integer (signed) access: read only mechanism: by value
The length in bytes of the input string.
str can contain one-byte and multi-byte characters.
HSY$TRUNC returns the position of the first character that follows the truncated string.
HSY$TRUNC str,len,offset
VMS usage: longword_signed type: longword integer (signed) access: write only mechanism: by value The offset in bytes which indicates the position of the first character just follows the truncated string. If this character is a multi-byte character, the offset will be pointing at the first byte of the multi-byte character.
str
VMS usage: longword_unsigned type: longword integer (unsigned) access: read only mechanism: by value
The address of the starting position of the input string.len
VMS usage: longword_signed type: longword integer (signed) access: read only mechanism: by value
The length in bytes of the input string.offset
VMS usage: longword_signed type: longword integer (signed) access: read only mechanism: by value
The offset in bytes of the character just follows the truncated string. It may not be on the proper character boundary, e.g. it can point to the second byte of a two-byte character.
str can contain one-byte and multi-byte characters. This routine helps you to position offset to the proper character boundary. Its function is similar to routine HSY$CH_CURR but with different parameter interface.
HSY$CH_GCHAR reads the current character.
HSY$CH_GCHAR cur,end
VMS usage: longword_unsigned type: longword integer (unsigned) access: write only mechanism: by value The current character.
cur
VMS usage: longword_unsigned type: longword integer (unsigned) access: read only mechanism: by value
The address of the current position of the specified current character. Note that this address must be on the proper character boundary, e.g. it should not point to the second byte of a two-byte character.end
VMS usage: longword_unsigned type: longword integer (unsigned) access: read only mechanism: by value
The address of the string terminating position plus one as illustrated below:
+---+---+---+---+ .. | | | | | +---+---+---+---+ string ^ end
This routine reads a character with end of buffer checking. FFFF (hex) will be returned when read past the end of buffer. If the current character is a one-byte 7-bit control character or one-byte 8-bit character (e.g. an 8-bit character followed by a 7-bit control character), the one-byte 7-bit or 8-bit character will be returned. No updating of current pointer is done since cur is passed by value.
HSY$CH_GNEXT reads the current character.
HSY$CH_GNEXT cur,end
VMS usage: longword_unsigned type: longword integer (unsigned) access: write only mechanism: by value The current character.
cur
VMS usage: longword_unsigned type: longword integer (unsigned) access: modify mechanism: by reference
The address of the current position of the specified current character. Note that this address must be on the proper character boundary, e.g. it should not point to the second byte of a two-byte character.end
VMS usage: longword_unsigned type: longword integer (unsigned) access: read only mechanism: by value
The address of the string terminating position plus one as illustrated below:
+---+---+---+---+ .. | | | | | +---+---+---+---+ string ^ end
This routine reads a character with end of buffer checking. FFFF (hex) will be returned when read past the end of buffer. If the current character is a one-byte 7-bit control character or one-byte 8-bit character (e.g. an 8-bit character followed by a 7-bit control character), the one-byte 7-bit or 8-bit character will be returned. Updating of the current pointer is done. After the read action, cur will be updated to the next character position pointing at the proper character boundary. This routine is useful for successive character reading.
HSY$CH_NEXTG reads the next character, skipping the current character.
HSY$CH_NEXTG cur,end
VMS usage: longword_unsigned type: longword integer (unsigned) access: write only mechanism: by value The next character.
cur
VMS usage: longword_unsigned type: longword integer (unsigned) access: modify mechanism: by reference
The address of the current position of the specified current character. Note that this address must be on the proper character boundary, e.g. it should not point to the second byte of a two-byte character.end
VMS usage: longword_unsigned type: longword integer (unsigned) access: read only mechanism: by value
The address of the string terminating position plus one as illustrated below:
+---+---+---+---+ .. | | | | | +---+---+---+---+ string ^ end
This routine reads the next character, skipping the current character. FFFF (hex) will be returned when read past the end of buffer. If the next character is a one-byte 7-bit control character or one-byte 8-bit character (e.g. an 8-bit character followed by a 7-bit control character), the one-byte 7-bit or 8-bit character will be returned. Updating of the current pointer is done. After the read action, cur will be updated to the next character position pointing at the proper character boundary.
Previous | Next | Contents |