|
|
Message-ID: <20260509192131.GC1827@brightrain.aerifal.cx>
Date: Sat, 9 May 2026 15:21:31 -0400
From: Rich Felker <dalias@...c.org>
To: Luca Kellermann <mailto.luca.kellermann@...il.com>
Cc: musl@...ts.openwall.com
Subject: Re: musl multi-level table format for binary locale images
On Sat, May 09, 2026 at 09:09:13PM +0200, Luca Kellermann wrote:
> On Fri, May 08, 2026 at 11:22:28PM -0400, Rich Felker wrote:
> > [...]
> >
> > Table structure;
> >
> > be32 start;
> > u8 shift;
> > u8 scale;
> > be16 size;
> > union {
> > u8 offsets8[size];
> > be16 offsets16[size/2];
> > be32 offsets32[size/4];
> > }
> > u8 data[];
> >
> > This represents a table of offsets for a range of integer key values
> > beginning at start. Keys are processed as unsigned 32-bit values, but
> > can represent a signed range crossing 0 as needed. Offsets may be
> > encoded as unsigned 8-, 16-, or 32-bit values.
>
> Does scale tell us the size of the offsets? Which values mean what?
Yes, it's 0, 1, or 2 for 8-, 16-, or 32-bit offsets.
>
> > [...]
> >
> > If shift is nonzero, the offset obtained at index (key-start)>>shift
> > in the offsets array leads to a subtable of the same form that will
> > take the remainder (key-state)&((1<<shift)-1) as its input; [...]
>
> Is this a typo and you wanted to say (key-start)&((1<<shift)-1)? Or is
> state something else?
Yes it's a typo. It should say start.
> > [...]
> >
> > Path layout:
> >
> > localeconv/-1: binary data for the char fields of struct lconv, in the
> > order they appear in the ISO C specification and in musl locale.h.
> > [...]
> >
> > localeconv/0..9: string data for the first 10 fields of struct lconv,
> > likewise in the order they appear in the specification and in musl.
> > Items 2 and 7 consist of a pair of strings separated by a null
> > terminator byte, [...]
>
> The order of the fields of struct lconv in musl's locale.h doesn't
> match any of the C standard drafts I checked (C99, C11, C17, C23).
Indeed, I always thought it did but didn't check that. So we should
just scratch that text from here. It matches the musl layout.
> It does however match the order in POSIX.1-2024 XBD 7.3.4 LC_NUMERIC
> [1] and XBD 7.3.3.1 LC_MONETARY Category in the POSIX Locale [2].
>
> It doesn't match the order in XBD 7.3.3 LC_MONETARY [3] / XSH 3
> localeconv() [4] (int_n_cs_precedes is in a different position) or XBD
> 14 <locale.h> [5] (alphabetical order).
Lovely. I'll just update this to document it explicitly and mention
that it's the same as 7.3.3.1 and 7.3.4.1, and check that the code is
actually correct to match the order of our struct.
> > Examples of data encoding:
> >
> > langinfo/LC_TIME:
> >
> > start = 131072
> > shift = 0
> > scale = 0
> > size = .....
> > offsets8[] = {
> > 1, 5, 9, 13, 17, 21, 25, 29, 36, ...
> > }
> > data[] = "Sun\0Mon\0Tue\0Wed\0Thu\0Fri\0Sat\0Sunday\0..."
> >
> > errors/1:
>
> These look like strerror strings, so this should be errors/0, right?
Yes.
>
> > start = -1
> > shift = 0
> > scale = 0
> > size = .....
> > offsets16[] = {
> > 1, 15, 36, 60, ...
> > }
> > data[] = "Unknown error\0"
> > "No error information\0"
> > "Operation not permitted\0"
> > "No such file or directory\0"
> > ...
>
> Would both of the examples use scale = 0? Assuming scale says
> something about the size of the offsets, they should differ.
Thanks for catching that. Indeed scale should be 1 for errors because
they don't all fit in 8-bit offsets.
Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.