Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250529155930.GA1827@brightrain.aerifal.cx>
Date: Thu, 29 May 2025 11:59:30 -0400
From: Rich Felker <dalias@...c.org>
To: Thorsten Glaser <tg@...bsd.de>
Cc: musl@...ts.openwall.com
Subject: Re: Collation, IDN, and Unicode normalization

On Thu, May 29, 2025 at 02:15:59PM +0000, Thorsten Glaser wrote:
> Rich Felker dixit:
> 
> >Extending that further, there are only 135 64-codepoint blocks. But 84
> >of those just have 1-3 codepoints in them, the rest still being dead
> 
> Is it an option to do blocks on the higher level and once you reach
> the block of a certain size (64? 128? 256?) do a linear search, if
> most really have that few codepoints?
> 
> Hm probably would suck for the 0300 block though.
> 
> Just had that crazy idea, not sure if it’s viable.

Yeah that's the problem with linear (or rather binary, no reason to do
linear) search options: the places you take a performance hit are
exactly the ones where all the common characters lie.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.