Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260225150256.GY1827@brightrain.aerifal.cx>
Date: Wed, 25 Feb 2026 10:02:57 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: High-level binary format for new locale files

On Wed, Nov 19, 2025 at 10:30:43PM -0500, Rich Felker wrote:
> This kind of linear search elimination was already done for strerror
> by Timo Teräs in commit 8343334d7b. I've been building on the same
> concept (multiple inclusion of a header file defining the data, with
> different context each time to expand to different parts of the table)
> so that we don't need to "pre-compile" the built-in C locale data to
> binary blobs like the ctype data, iconv data, nfd decomposition data,
> etc. but can instead let the preprocessor do the work and keep the
> data itself in editable source form.
> 
> It's also desirable that the same data format used for locale strings
> (langinfo, strerror, etc.) also work for collation elements. This
> doesn't entirely preclude having a single flat integer namespace of
> keys (for example you could or a code onto the upper bits of
> codepoints to mean "collation element") but it does suggest against
> it.
> 
> [...]
> 
> To demo simplified use of the table design and a potential specific
> binary format to use, I have a draft version of the include files to
> produce built-in C locale data described above. These need a little
> polishing still, so I'll include them in a follow-up to come soon.

This was supposed to be posted a long time ago, but better late than
never. The naming and parametrization might be a little bit clunky and
I expect to rework it for actual inclusion in musl with the locale
work, but it demonstrates all the concepts.

The attached __strerror.h is just from current musl. strerror2.c does
not contain any lookup code; it's just the top-level file to
instantiate the table. Compiling it to an object file lets you examine
the emitted binary format. Compiling it with -E to see preprocessed
output is also informative.

The magic is in mdecl2.h and mdata2.h. These define the binary format
and how the source data passed to the M() macro translates into the
binary format. At the moment I don't have a presentable version
actually using the multi-level aspect of the table, which nl_langinfo
needs, and which needs to be there in a minimal form in strerror data.

Rich

View attachment "__strerror.h" of type "text/plain" (4063 bytes)

View attachment "sterror2.c" of type "text/plain" (192 bytes)

View attachment "mdata2.h" of type "text/plain" (575 bytes)

View attachment "mdecl2.h" of type "text/plain" (527 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.