Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 17 Jul 2015 17:35:22 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Left-shift of negative number

On Fri, Jul 17, 2015 at 09:02:59PM +0200, Jens Gustedt wrote:
> Effectively, the C standard at the place that you cite doesn't define
> a behavior for such shifts of negative values. But this doesn't mean
> that a particular implementation of a C compiler or the C library
> (here musl) can't define a behavior for that.

musl does not assume GCC behavior like this, so the code indeed is
wrong and should be fixed.

> What worries me more than the shift of a negative value, is that this
> code is erroneous if `int` is only 16 bit wide. Whereas we can
> reasonably assume that a shift of a negative value in two's complement
> is the same as an unsigned shift, compilers tend to produce just crap
> if the shift exceeds the width.
> 
> So I would feel much more comfortable if we'd use UINT32_C(0x40)
> inside the R macro.

The entire internal API here uses the type unsigned for character
codes and state, so like the rest of musl there is an assumption
(guaranteed by POSIX) that int is at least 32-bit. Since the
UTF-8/multibyte code is written to be largely self-contained and
independent of musl, we could look into enhancing the code to be
portable to systems with 16-bit int, but I suspect this would be
rather useless in practice. If we did that, we would need to use
something ugly like uint_least32_t rather than uint32_t to gain any
portability since the latter need not even exist.

There are also aliasing issues with using a type different than
'unsigned' for the decoding state since mbstate_t's members are
unsigned. So at least at this time I'd really rather not pursue this
further.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.