Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 28 Jan 2022 08:58:30 -0800
From: enh <enh@...gle.com>
To: musl@...ts.openwall.com
Cc: Rich Felker <dalias@...c.org>
Subject: Re: A journey of weird file sorting and desktop systems

(Android's libc maintainer here...)

i'd argue this isn't a musl bug. on Android we make a clear distinction between:

1. libc's responsibilities which, to paraphrase rich, are basically
"be unsurprising because your audience is OS/app developers who don't
speak all the languages their users use anyway". that is: "code point
order".

2. icu's responsibilities which cover all the user-facing (as opposed
to developer-facing) stuff. i18n is *hard* and the C/POSIX APIs are,
to be blunt, not fit for *that* purpose. there's a reason why all of
Android/macOS/Windows (and all the browsers) ship copies of icu.

the bug here is that a desktop file manager is assuming "i just want
telephone book order --- how hard can it be?". the answer turns out to
be "hard". especially when you get into fun stuff like users who *do*
speak multiple languages and have strong expectations for how they
sort. or places where there are multiple sort orders in common use.
you don't even need to be in very "exotic" languages to start hitting
these things. German and Spanish will do fine. see
https://unicode-org.github.io/icu/userguide/collation/ for a handful
of specific examples.

(as the maintainer of Android's Java i18n stuff before i ended up
owning bionic, you'd be surprised at the extent to which even Java --
which tried pretty hard by 1990s standards -- doesn't really cover
everything you need, not even for languages like Russian. so i don't
think C/POSIX could have done a great job in the 1990s, and one of
icu's main benefits is that it's been able to evolve to better support
existing languages/support more languages rather than being ossified
by an insufficient standard.)

"if you care about your users, you need icu/CLDR" is the easy side of
the argument. the flip side -- that libc *shouldn't* get involved --
is trickier. what convinced me was the amount of *breakage* you cause
if you try to be "good guy greg"... it turns out no-one wants dotless
i breaking their build just because their locale is a turkish/azeri
locale, for example. (dotted/dotless i is by far the most common
real-world issue i've seen.) but it's that kind of "text manipulation
tool used during builds" that are most likely to use libc
functionality, and although, sure, we can chase *everyone* making sure
they set their locale to "C" when building ... are we helping at that
point, or just making more work for everyone? (without actually
solving the real problem for the folks who just want to use their file
browser.)

On Fri, Jan 28, 2022 at 7:06 AM ellie <el@...se64.org> wrote:
>
> I don't think nowadays the majority of users should be expected to be
> traditional *nix users with terminal knowledge anymore. And most modern
> desktop distros don't default to such a sorting as far as I can tell,
> and instead to en_US or alike - but all those which use musl are left
> stranded with "C" sorting. The type of users who are hit most by this
> are not going to be the type who know what a terminal is, what musl is,
> or how to voice their opinion on LC_COLLATE because their file manager
> looks so weird. So if you want them to show up here that probably won't
> happen. Beyond myself, I suppose.
>
> I think for a typical user-friendly desktop the need is kinda clear, so
> I'm not sure what other sort of setting would need to be introduced
> still. If musl is meant to be used on desktop distros, this just seems
> kind of mandatory, or I'm not really getting why it wouldn't be.
>
> My apologies however if I'm misunderstanding, but that was basically
> your question/what you're saying is delaying it, right? Sorry if you
> didn't want further input from me on this, I hope I read your e-mail right
>
> On 1/28/22 3:10 PM, Rich Felker wrote:
> > On Fri, Jan 28, 2022 at 02:41:38PM +0100, ellie wrote:
> >> After spending a bit wondering why files like "elder1" and "Elder2"
> >> end up at completely different spots in the file list on my
> >> postmarketOS (=Alpine-based) system, I filed a ticket with the Nemo
> >> file manager. Turns out Nemo just uses locale-dependent sorting, so
> >> I spent an hour trying to set LC_COLLATE to fix this, until I
> >> stumbled across the remark on musl's website that LC_COLLATE sorting
> >> is simply not supported. So I seem to be stuck with this, which I
> >> did not expect.
> >>
> >> This to me seems kind of disastrous on a desktop system. I just fail
> >> to see any average default user (who doesn't know ASCII in their
> >> head) expecting "elder1" and "Elder2" to be miles apart in a sorted
> >> listing even as a default US person, let alone in some other
> >> language that may be expected to use a different sorting for
> >> whatever reason. (This affects umlauts too, I assume? So that'd be
> >> most European languages having file lists entirely messed up, too.)
> >> The sorting shouldn't be stuck as something that just makes sense to
> >> programmers and balks at any special vowels, and it appears at least
> >> as of now there is just no way to fix this.
> >>
> >> Should desktop file managers like Nemo not be using this sorting
> >> function? Or is musl not intended for desktop use, and postmarketOS
> >> should switch? Otherwise, it seems like this omission in musl seems
> >> like kind of a big deal. Or is it really just me who is constantly
> >> confused as to where any file is at in any file lists...?
> >>
> >> Or in other words, would be kind of cool if this could be changed
> >
> > LC_COLLATE functionality is just not designed or implemented yet, due
> > to lack of interest/participation from folks who want it to happen. I
> > very much do want it to happen, but I don't want to design something
> > (data model for efficient collation tables & code to use them) only to
> > have it turn out not to meet everyone's/anyone's needs because there
> > was nobody to bounce questions/testing/what-if's off during the
> > design.
> >
> > A big part of this is probably that, historically, *nix users tend to
> > be happy with (or even prefer, which they can explicitly set via
> > exporting LC_COLLATE=C) codepoint-order sorting of directory entries,
> > like Makefile and README appearing at the top. So to get these folks
> > to care you have to have another setting where collation order
> > matters.
> >
> > I'm happy to restart the process for getting this done if ppl are
> > interested.
> >
> > Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.