Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 24 May 2021 17:50:22 -0400
From: Rich Felker <dalias@...c.org>
To: Konstantin Isakov <dragonroot@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: [BUG] swprintf() doesn't handle Unicode characters
 correctly

On Mon, May 24, 2021 at 12:39:35AM -0400, Konstantin Isakov wrote:
> Hi,
> 
> The following program:
> 
> ===================================
> #include <stdio.h>
> #include <wchar.h>
> 
> int main()
> {
>   wchar_t buf[ 32 ];
> 
>   swprintf( buf, sizeof( buf ) / sizeof( *buf ), L"ab\u00E1c" );
> 
>   for ( wchar_t * p = buf; *p; ++p )
>     printf( "%u\n", ( unsigned ) *p );
> 
>   return 0;
> }
> ===================================
> 
> With musl 1.2.2 produces the following output:
> 97
> 98
> 
> The expected output is:
> 97
> 98
> 225
> 99
> 
> With musl, only the first two characters ('a' and 'b') are processed, and
> the string ends on a Unicode character (U+00E1, which is an 'a' with acute
> accent), instead of outputting it and the last character, 'c'.
> 
> Please CC me when replying. Thanks!

You need to call setlocale(LC_CTYPE, ""). Otherwise the character
\u00e1 is unrepresentable, because POSIX requires the C locale be
single-byte and you're in the C locale until you call setlocale, and
thus produces an encoding error (EILSEQ).

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.