Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4aed8a2e-22e5-708f-1275-9814a8ffc339@redhat.com>
Date: Thu, 5 Feb 2026 21:22:54 +0000 (UTC)
From: Joseph Myers <josmyers@...hat.com>
To: libc-coord@...ts.openwall.com
cc: Keith Packard <keithp@...thp.com>
Subject: Re: c8rtowc and wcrtoc8

On Thu, 5 Feb 2026, Florian Weimer wrote:

> * Keith Packard:
> 
> > Because the char8_t encoding is not stateless, something like mbstate_t
> > is required.
> 
> Is this accurate?  I thought that conversion from and to UTF-8 would
> only operate on complete multibyte sequences.

Indeed, the version of "Restartable Functions for Efficient Character 
Conversion" that was actually accepted into C2y (N3366 plus an editorial 
correction) is explicit that "For the UTF-8, UTF-16, and UTF-32 encodings, 
collectively referred to as the Unicode encodings, an indivisible unit of 
work for a read operation shall be the sequence of code units that 
corresponds to one Unicode code point.".

-- 
Joseph S. Myers
josmyers@...hat.com

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.