Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 28 Jan 2022 10:33:53 -0800
From: enh <enh@...gle.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: A journey of weird file sorting and desktop systems

On Fri, Jan 28, 2022 at 10:01 AM Rich Felker <dalias@...c.org> wrote:
>
> On Fri, Jan 28, 2022 at 08:58:30AM -0800, enh wrote:
> > (Android's libc maintainer here...)
> >
> > i'd argue this isn't a musl bug. on Android we make a clear distinction between:
> >
> > 1. libc's responsibilities which, to paraphrase rich, are basically
> > "be unsurprising because your audience is OS/app developers who don't
> > speak all the languages their users use anyway". that is: "code point
> > order".
>
> That's not what I said. I speculated that part of the difficulty with
> getting people to care is that a large number of users personally
> prefer LC_COLLATE=C. Not that we should punt because of that.
>
> > 2. icu's responsibilities which cover all the user-facing (as opposed
> > to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are,
> > to be blunt, not fit for *that* purpose. there's a reason why all of
> > Android/macOS/Windows (and all the browsers) ship copies of icu.
>
> ICU is really, *really* bad. I don't want to be encouraging people to
> use it because basic functionality is missing from libc.

human languages are really really messy. a lot of the complexity is inherent.

as for the non-inherent, https://github.com/unicode-org/icu4x seems
like a good start.

> > the bug here is that a desktop file manager is assuming "i just want
> > telephone book order --- how hard can it be?". the answer turns out to
> > be "hard". especially when you get into fun stuff like users who *do*
> > speak multiple languages and have strong expectations for how they
> > sort. or places where there are multiple sort orders in common use.
>
> Absolutely. That's why I don't want to treat the problem half-assedly,

but that's my point --- it's not the *implementation* that's the
issue, it's that the C/POSIX *interfaces* are insufficient. the bar on
how good a job you _can_ do within those constraints is horribly low.

> but make sure we design or choose a format for the collation tables
> that's simultaneously (1) efficient, (2) sufficiently expressive to
> give the behaviors users may want, and (3) easy enough to understand
> that users can customize it if needed. The POSIX localedef format (an
> option group musl intentionally does not support) does not have any of
> those properties except maybe #2. The standard Unicode format may
> translate directly into something that can meet all 3; I'm not sure.
>
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.