Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 25 Dec 2019 21:07:05 +0100
From: Florian Weimer <>
To: JeanHeyd Meneide <>
Subject: Re: [ Guidance ] Potential New Routines; Requesting Help

* JeanHeyd Meneide:

>      I hope this e-mail finds you doing well this Holiday Season! I am
> interested in developing a few fast routines for text encoding for
> musl after the positive reception of a paper for the C Standard
> related to fast conversion routines:

I'm somewhat concerned that the C multibyte functions are too broken
to be useful.  There is a at least one widely implemented character
set (Big5 as specified for HTML5) which does not fit the model implied
by the standard.  Big5 does not have shift states, but a C
implementation using UTF-32 for wchar_t has to pretend it has because
correct conversion from Unicode to Big5 needs lookahead and cannot be
performed one character at a time.

This would at least affect the proposed c8rtomb function.

I posted a brief review of the problematic charsets in glibc here:


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.