Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <3ae98a928a77f642021396442b4c24b1ac9d3a63.camel@postmarketos.org>
Date: Sun, 15 Jun 2025 00:38:46 +0200
From: Pablo Correa Gomez <pabloyoyoista@...tmarketos.org>
To: musl@...ts.openwall.com
Subject: On current (and future) use of LCTRANS

Hi all,

As a follow-up from https://www.openwall.com/lists/musl/2025/06/02/2 I
have been looking at the different places where we currently do
translations. Those basically come out as the places where LCTRANS and
LCTRANS_CUR macros are used. There was one single place where the macro
was not used, addressed by
https://www.openwall.com/lists/musl/2025/06/02/1 so the discussion
considers that patch will be merged. 

We basically have 2 different kind of things that use these macros:

* Static strings:
  * Hard-coded error messages in src/errno/__strerror.h
  * Hard-coded human-readable signal names in src/string/strsignal.c
  * Hard-coded error messages in src/regex/regerror.c
  * Hard-coded error messages in src/network/hstrerror.c
  * Hard-coded error messages in src/network/gai_strerror.c
* Actual translations. This basically happens in src/locale/langinfo.c
It is exposed as the public functions nl_langinfo{_l}, but also used
internally in several places:
  * strftime{_l}
  * wcsftime{_l}
  * asctime{_r} and in consequence ctime{_r}
  * strptime
  * catopen

Unfortunately, there are quite some many functions that currently
ignore locales when being passed to them:

* is{w}alnum_l
* is{w}alpha_l
* is{w}blank_l
* is{w}cntrl_l
* is{w}digit_l
* is{w}graph_l
* is{w}lower_l
* is{w}print_l
* is{w}punct_l
* is{w}space_l
* is{w}upper_l
* is{w}xdigit_l
* iswctype_l
* strfmon_l
* to{w}lower_l
* to{w}upper_l
* towctrans_l

We might be able to go without using locales in some of them (like
isdigit), but we certainly cannot with others that currently use ASCII
codes where letters in other alphabets don't fit.

In addition, we have some functions related to collation where this is
also ignored:

* {wcs,wcsn,str,strn}casecmp_l
* {wcs,str}coll_l
* {wcs,str}xfrm_l
* wctrans_l
* wctype_l

In addition to this, we have the RADIXCHAR, which we hard-code in many
places while doing transformations. Finding the exact places where it
has to be implemented might be more tricky, but a non-exhaustive list:

* vstrfmon_l (internal used by strfmon family)
* fmt_ft (internal used by printf family)
* dec_float,hex_float (internal used by floatscan family)

Hopefully, this should be a good start to a discussion on
implementation details, and things we shouldn't care about.

Best,
Pablo.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.