mbrtoc32(3) - NetBSD Manual Pages

MBRTOC32(3)             NetBSD Library Functions Manual            MBRTOC32(3)


NAME
mbrtoc32 -- Restartable multibyte to UTF-32 conversion
LIBRARY
Standard C Library (libc, -lc)
SYNOPSIS
#include <uchar.h> size_t mbrtoc32(char32_t * restrict pc32, const char * restrict s, size_t n, mbstate_t * restrict ps);
DESCRIPTION
The mbrtoc32 function decodes multibyte characters in the current locale and converts them to Unicode scalar values (i.e., to UTF-32), keeping state so it can restart after incremental progress. Each call to mbrtoc32: 1. examines up to n bytes starting at s, 2. yields a Unicode scalar value (i.e., a UTF-32 code unit) if avail- able by storing it at *pc32, 3. saves state at ps, and 4. returns either the number of bytes consumed if any or a special return value. Specifically: · If the multibyte sequence at s is invalid after any previous input saved at ps, or if an error occurs in decoding, mbrtoc32 returns (size_t)-1 and sets errno(2) to indicate the error. · If the multibyte sequence at s is still incomplete after n bytes, including any previous input saved in ps, mbrtoc32 saves its state in ps after all the input so far and returns (size_t)-2. · If mbrtoc32 decodes the null multibyte character, then it stores zero at *pc32 and returns zero. · Otherwise, mbrtoc32 decodes a single multibyte character, stores its Unicode scalar value at *pc32, and returns the number of bytes con- sumed to decode the first multibyte character. If pc32 is a null pointer, nothing is stored, but the effects on ps and the return value are unchanged. If s is a null pointer, the mbrtoc32 call is equivalent to: mbrtoc32(NULL, "", 1, ps) This always returns zero, and has the effect of resetting ps to the ini- tial conversion state, without writing to pc32, even if it is nonnull. If ps is a null pointer, mbrtoc32 uses an internal mbstate_t object with static storage duration, distinct from all other mbstate_t objects (including those used by mbrtoc8(3), mbrtoc16(3), c8rtomb(3), c16rtomb(3), and c32rtomb(3)), which is initialized at program startup to the initial conversion state.
RETURN VALUES
The mbrtoc32 function returns: 0 [null] if mbrtoc32 decoded a null multibyte character. i [scalar value] where 0 <= i <= n, if mbrtoc32 consumed i bytes of input to decode the next multibyte charac- ter, yielding a Unicode scalar value. (size_t)-2 [incomplete] if mbrtoc32 found only an incomplete multibyte sequence after all n bytes of input and any previous input, and saved its state to restart in the next call with ps. (size_t)-1 [error] if any encoding error was detected; errno(2) is set to reflect the error.
EXAMPLES
char *s = ...; size_t n = ...; mbstate_t mbs = {0}; /* initial conversion state */ while (n) { char32_t c32; size_t len; len = mbrtoc32(&c32, s, n, &mbs); switch (len) { case 0: /* NUL terminator */ assert(c32 == 0); goto out; default: /* scalar value */ printf("U+%04"PRIx32"\n", (uint32_t)c32); break; case (size_t)-2: /* incomplete */ printf("incomplete\n"); goto readmore; case (size_t)-1: /* error */ printf("error: %d\n", errno); goto out; } s += len; n -= len; }
ERRORS
[EILSEQ] The multibyte sequence cannot be decoded in the current locale as a Unicode scalar value. [EIO] An error occurred in loading the locale's character conver- sions.
SEE ALSO
c16rtomb(3), c32rtomb(3), c8rtomb(3), mbrtoc16(3), mbrtoc8(3), uchar(3) The Unicode Standard, https://www.unicode.org/versions/Unicode15.0.0/UnicodeStandard-15.0.pdf, The Unicode Consortium, September 2022, Version 15.0 -- Core Specification.
STANDARDS
The mbrtoc32 function conforms to ISO/IEC 9899:2011 (``ISO C11'').
HISTORY
The mbrtoc32 function first appeared in NetBSD 11.0. NetBSD 10.99 August 14, 2024 NetBSD 10.99

Powered by man-cgi (2024-08-26). Maintained for NetBSD by Kimmo Suominen. Based on man-cgi by Panagiotis Christias.