Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 11 Nov 2014 14:53:02 +0100
From: Jens Gustedt <jens.gustedt@...ia.fr>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] implement a private state for the uchar.h
 functions

Am Montag, den 10.11.2014, 22:21 -0500 schrieb Rich Felker:
> On Sun, Nov 09, 2014 at 11:18:08AM +0100, Jens Gustedt wrote:
> > The C standard is imperative on that:
> > 
> >   7.28.1 ... If ps is a null pointer, each function uses its own internal
> >   mbstate_t object instead, which is initialized at program startup to
> >   the initial conversion state;
> 
> Thanks. Actually I originally had this functionality and removed it
> because it seemed to be unnecessary, due to the requirement being
> buried in that introductory text rather than the descriptions of the
> individual functions. I figured the committee had just intentionally
> decided not to copy this backwards functionality from the old
> multibyte functions into the new uchar ones, but sadly that's not the
> case...

Yes these are bizarre additions. That has almost a dozen different
static states for all of the different restartable functions.

Perhaps I misunderstood something, but isn't it that in direction mbs
-> charXX_t these functions allow to handle surrogates, but the other
way around is not possible?

From that new unicode support in C11 I get some of the ideas, but some
things remain quite misterious

 - having a standard way to specify unicode characters inside a string
   of any kind through \u and \U is really a great achievement

 - introducing types charXX_t and constants literals with u and U is
   already less clear. The only thing that can be done with them is
   conversion, there are no auxiliary functions. In particular the
   character counting and classification problems for surrogates is
   still not solved.

 - introducing a u8 prefix for strings that guarantees utf8 encoding
   for mbs sounds nice. But then there is nothing that relates these
   to "normal" string literals. What are we supposed to do with these?

Jens

-- 
:: INRIA Nancy Grand Est ::: AlGorille ::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::



Download attachment "signature.asc" of type "application/pgp-signature" (199 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.