Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 10 May 2015 20:19:46 +0800
From: 罗勇刚(Yonggang Luo)  <luoyonggang@...il.com>
To: Rich Felker <dalias@...c.org>
Cc: John Sully <john@...uare.ca>, Karsten Blees <blees@...n.de>, musl@...ts.openwall.com, 
	dplakosh@...t.org, austin-group-l@...ngroup.org, hsutter@...rosoft.com, 
	Clang Dev <cfe-dev@...uiuc.edu>, James McNellis <james@...esmcnellis.com>
Subject: Re: Re: [cfe-dev] Is that getting wchar_t to be 32bit on win32
 a good idea for compatible with Unix world by implement posix layer on win32 API?

2015-05-10 4:05 GMT+08:00 Rich Felker <dalias@...c.org>:
> On Sat, May 09, 2015 at 07:19:14PM +0800, 罗勇刚(Yonggang Luo)  wrote:
>> 2015-05-09 18:36 GMT+08:00 Szabolcs Nagy <nsz@...t70.net>:
>> > * John Sully <john@...uare.ca> [2015-05-09 00:55:12 -0700]:
>> >> In my opinion you almost never want 32-bit wide characters once you learn
>> >> of their limitations.  Most people assume that if they use them they can
>> >> return to the one character -> one glyph idiom like ASCII.  But Unicode is
>> >
>> > wchar_t must be at least 21 bits on a system that spports unicode
>> > in any locale: it has to be able to represent all code points of the
>> > supported character set.
>> >
>> > in practice this means that the only conforming definition to iso c
>> > (and thus posix, c++ and other standards based on c) is a 32bit wchar_t
>> > (the signedness can be choosen freely).
>> >
>> > so the definition is not based on what "you almost never want" or what
>> > "most people assume".
>> >
>> > if the goal is to provide a posix implementation then 16bit wchar_t
>> > is not an option (assuming the system wants to be able to communicate
>> > with the external world that uses unicode text).
>> wchar_t is not the only way to communicate with the external way, and
>> it's also not suite for communicate to the external world,
>
> Of course it's not. UTF-8 is. But per both ISO C and POSIX, any
> character the locale supports has a representation as wchar_t. If
> wchar_t is only 16-bit, then you fundamentally can't support all of
> Unicode in the locale's encoding. mbrtowc has to fail with EILSEQ for
> 4-byte characters, regex functions cannot process 4-byte characters,
> etc. Such a system is is conforming to the requirements for C and
> POSIX but does not support Unicode (in full) at the locale level.
>
>> from the C11 standard, it's never restrict the wchar_t's width, and
>> for Posix, most API are implement in
>> utf8, and indeed, Windows need the posix layer mainly because of those
>> API that using utf8, not wchar_t APIs,
>> for the communicate reason to getting wchar_t to be 32 bit on Win32 is
>> not a good idea,
>>
>> And for portable text processing(Including win32) apps or libs, they
>> would and should never dependents on the wchar_t must be 32 bit width.
>
> If __STDC_ISO_10646__ is defined, wchar_t must have at least 21 value
> bits. Applications which are portable only to systems where this macro
> is defined, or which have some fallback (like dropping multilingual
> text support) for systems where it's not defined, CAN make such
> assumptions.
>
>> And C11/C++11 already provide uchar.h to provide cross-platform
>> char16_t and char32_t, so there is no reason to getting wchar_t to be
>> 32bit
>> on win32 for suport posix on win32.
>
> If wchar_t is 16-bit, you can't represent non-BMP characters in
> char32_t because they can't be part of the locale's character set. All
> char32_t buys you then is 16 wasted zero bits.
>
>> We were intent to creating a usable posix layer on win32, not creating
>> a theoretical POSIX layer that would be useless, on win32, we should
>> considerate the de facto things
>> on win32.
>
> Uselessness is a big assumption you're making that's not supported by
> data. If you actually provide a working POSIX layer, you'll have
> pretty much any application that's currently working on Linux, BSDs,
> etc. (with actual portable code, not system-specific #ifdefs) working
> on Windows with few or no changes. If you do that with 32-bit wchar_t,
> they'll support Unicode fully. If you do it with 16-bit wchar_t, then
> the ones that are using the locale system for character handling will
> have to be refitted with extra layers to support more than the BMP,
> and those patches probably (hopefully) won't be accepted upstream.
>
> The only applications that would benefit from having 16-bit wchar_t
> are existing Windows applications that are not going to have much use
> for a POSIX layer anyway, and they can be fixed very easily with
> search-and-replace (no new code layers).
That's not so easy as you said to search-and-replace,

Windows and POSIX there is a lot of incompatible and that won't be changed, or
We just implement a virtual machine that running on Win32, that's
would compatible all the POSIX
things on win32, but that's useless

The intention to provide a POSIX layer is to reduce the burden for
those Developers have intension
to create cross-platform(include Windows), but not for those
Developers that only intent to developing apps
for Linux/POSIX.

So such a layer should preserve the usable part of POSIX and dropping
those part that just creating inconvenience.
wchar_t to be 32bit is obviously suite for Win32.

My intention is not developing a virtual machine like layer such as
cygwin, but a native Win32 layer that provide
most POSIX functions and with utf8 support, that would solve most
portable issue and works on win32 just like
a win32 app but not a Unix/Linux app.
>
> Rich



-- 
         此致
礼
罗勇刚
Yours
    sincerely,
Yonggang Luo

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.