|
|
Message-ID: <20260120123622.GY3520958@port70.net>
Date: Tue, 20 Jan 2026 13:36:22 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: Xan Phung <xan.phung@...il.com>
Cc: Openwall musl <musl@...ts.openwall.com>
Subject: Re: [PATCH v1] wctype: reduce text size of iswalpha & iswpunct
by 53%
* Xan Phung <xan.phung@...il.com> [2026-01-20 05:25:16 +0800]:
> On Tue, 20 Jan 2026, 3:55 am Szabolcs Nagy, <nsz@...t70.net> wrote:
> > note that decimal '255' is shorter than '0xff' so decimal
> > means a bit less data in the source code.
> >
>
>
> Ha, ha, ha :) Not sure if you are joking, but when I was referring to the
> 'text size' I mean the compiled code and constant data size reported by the
> 'size' tool, not the source code size.
>
> But if you are saying musl coding standards prefer decimal due to it's more
> compact source code data size, then yes I can make decimal source files
> instead of hexadecimal.
this is a minor comment,
but the source code size matters too as musl is distributed in
source form too. decimal data likely deflate compresses better
so this affects git history size too. obviously for actual code
the readability is more important, but for data there is no
such constraint thus we can choose, so why use hex?
more importantly this code looks endian dependent:
> > > +const static unsigned char table[PAGEH + 227*sizeof(unsigned)] = {
> > > +#include "iswalpha_table.h"
> > > };
byte order matters as table data is not equal per 4 byte chunks:
...
+0xec,0xc7,0x3d,0xd6,0x18,0xc7,0xff,0xc3,0xc7,0x1d,0x81,0x00,0xc0,0xff,0x00,0x00,
+0x00,0x00,0x00,0x00,0x55,0x55,0x55,0x55,
> > > +int iswalpha(wint_t wc) {
> > > + unsigned *huffm = (unsigned *)(table + PAGEH), target;
deref of huffm is technically an aliasing violation and thus ub.
> > > + huff = huffm[base];
> > > + base+= (rev = -(page & 1)) + 1;
> > > + type = (huff >> (2 * lane)) & 3;
> > > + popc = (huff << (31 - 2 * lane));
in such cases i think the cleanest solution is
static const struct {
unsigned char tab1[PAGEH];
unsigned tab2[227];
} data = {...};
the struct helps on targets where computing the
address of a global takes multiple instructions.
(otherwise you could just use two separate tabs)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.