GNUstep CoreBase Library 0.2
Unicode String Utilities

Detailed Description

Functions

CFIndex GSUnicodeFormatWithArguments (UniChar *__restrict__ s, CFIndex n, CFTypeRef locale, const UniChar *__restrict__ format, CFIndex fmtlen, va_list ap)
 
CFIndex GSUnicodeFormat (UniChar *__restrict__ s, CFIndex n, CFTypeRef locale, const UniChar *__restrict__ format, CFIndex fmtlen,...)
 

Convert to/from a Unicode String

CFIndex GSUnicodeFromEncoding (UniChar **d, const UniChar *const dLimit, CFStringEncoding enc, const UInt8 **s, const UInt8 *const sLimit, const UTF16Char loss)
 Convert a string in some external encoding to Unicode (UTF-16).
 
CFIndex GSUnicodeToEncoding (UInt8 **d, const UInt8 *const dLimit, CFStringEncoding enc, const UniChar **s, const UniChar *const sLimit, const char loss, Boolean addBOM)
 Convert a Unicode string (UTF-16) to some external encoding.
 

Function Documentation

◆ GSUnicodeFromEncoding()

CFIndex GSUnicodeFromEncoding ( UniChar ** d,
const UniChar *const dLimit,
CFStringEncoding enc,
const UInt8 ** s,
const UInt8 *const sLimit,
const UTF16Char loss )

This function is used internally to convert to Unicode from the various supported encodings.

The function performs checks on both the input and output to verify the results of the conversion is valid UTF-16 data.

Note
This function always attempts to consume the source buffer s completely. It will only stop if an invalid character and no loss character was provided. Certain encodings, like UTF-7, are stateful and cannot be converted recursively. This differs from the behavior of GSUnicodeToEncoding() and note must be taken.
Parameters
[in,out]dPointer to the address the start of the destination buffer. If NULL or pointing to NULL, will cause the function to perform the conversion but not write any data out. On return, points to memory immediately after where the last byte of data was written out.
[in]dLimitA pointer to memory immediately after the end of the destination buffer.
[in]encEncoding of the data in source buffer.
[in,out]sPointer to the first character of the source buffer. This value must not point to NULL or be NULL itself.
[in]sLimitA pointer to memory immediate after the end of the source buffer.
[in]lossA substitute character for invalid input. For example, if a UTF-8 input string encodes a surrogate without a pair. A typical character would be U+FFFD (replacement character). Specify a value of 0 if you do not want lossy conversion.
Returns
The amount of UniChar characters required to successfully complete the conversion. Will return -1 if an error is encountered, such as an invalid character and no loss character was provided. If an error occurs, dLen and sLen are still updated and reflect where the error occurred.
See also
GSUnicodeToEncoding()

◆ GSUnicodeToEncoding()

CFIndex GSUnicodeToEncoding ( UInt8 ** d,
const UInt8 *const dLimit,
CFStringEncoding enc,
const UniChar ** s,
const UniChar *const sLimit,
const char loss,
Boolean addBOM )

This function is used internally to convert from Unicode to the various supported encodings.

The function performs minimal checks on the input data and will only fail if a code point cannot be converted to the specified encoding and a loss character was not provided.

Note
This function only attempts to fill the destination buffer d. Only if d or dLen are NULL will this function attempt to consume the source buffer completely. Additionally, in the case when converting to UTF-16 this function does not perform any checks to ensure the input and output are correct. This differs from the behavior of GSUnicodeFromEncoding() and note must be taken.
Parameters
[in,out]dPointer to the address the start of the destination buffer. If NULL or pointing to NULL, will cause the function to perform the conversion but not write any data out. On return, points to memory immediately after where the last byte of data was written out.
[in]dLimitA pointer to memory immediately after the end of the destination buffer.
[in]encEncoding of the data in source buffer.
[in,out]sPointer to the first character of the source buffer. This value must not point to NULL or be NULL itself.
[in]sLimitA pointer to memory immediate after the end of the source buffer.
[in]lossA substitute character for invalid input. For example, if a UTF-8 input string encodes a surrogate without a pair. A typical character would be '?' (replacement character). Specify a value of 0 if you do not want lossy conversion.
[in]addBOMIf true, adds a byte order mark to the start of the destination buffer.
Returns
The number of successfully converted converted UTF-16 code points. May return -1 if an error is encountered, such as an invalid character and no loss character was provided.
See also
GSUnicodeFromEncoding ()