Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250902234203.GO1827@brightrain.aerifal.cx>
Date: Tue, 2 Sep 2025 19:42:03 -0400
From: Rich Felker <dalias@...c.org>
To: Sertonix <sertonix@...teo.net>
Cc: musl@...ts.openwall.com
Subject: Re: unreliable/unused mbstowcs check in is_valid_hostname

On Tue, Sep 02, 2025 at 07:56:18PM +0000, Sertonix wrote:
> Inside of is_valid_hostname there is a mbstowcs(0, host, 0) == -1 check.
> It seems like the next line validates the characters so checking for
> invalid multibyte sequence shouldn't be needed. And as far as I can tell
> 
> Before 2abb70c302ef the check was mbstowcs(0, host, 0) > 255 so maybe
> that commit was incorrect?

As-is, the code doesn't do anything. it was written with the intent
that the subsequent-line check that rejects strings consisting of
anything but [[:alnum:].-] be removed and proper IDN translation
support added. That's still a future project. There was a partial
patch submitted but it's missing a fair amount of the functionality
needed to meet standards/expected behavior.

> 1507ebf83733 + f22a9edaf8a6 made mbstowcs never returns -1 when
> CURRENT_UTF8 is false even though is_valid_hostname should probably not
> be local dependant.

The intent was that it be locale-dependent, in that malformed UTF-8 is
rejected right away as long as a real (non-"C") locale is set, and
that all high bytes are rejected later in the C locale since they
don't correspond to any characters.

I'm not entirely sure this is necessary for our eventual IDN
implementation. There's no rule that says it couldn't interpret UTF-8
names even when the byte sequences aren't being interpreted as making
up characters otherwise. But in a context where someone has explicitly
asked for the byte-based locale, it's probably preferable that they
don't get back results they can't interpret as characters, but instead
get back the raw punycode.

In any case, I don't think any of this matters until we get to really
doing IDN. The code isn't doing anything right now, but changing it
based on expectations of how IDN will later work without actually
having worked that out seems like pointless churn.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.