![]() |
|
Message-ID: <20250902234203.GO1827@brightrain.aerifal.cx> Date: Tue, 2 Sep 2025 19:42:03 -0400 From: Rich Felker <dalias@...c.org> To: Sertonix <sertonix@...teo.net> Cc: musl@...ts.openwall.com Subject: Re: unreliable/unused mbstowcs check in is_valid_hostname On Tue, Sep 02, 2025 at 07:56:18PM +0000, Sertonix wrote: > Inside of is_valid_hostname there is a mbstowcs(0, host, 0) == -1 check. > It seems like the next line validates the characters so checking for > invalid multibyte sequence shouldn't be needed. And as far as I can tell > > Before 2abb70c302ef the check was mbstowcs(0, host, 0) > 255 so maybe > that commit was incorrect? As-is, the code doesn't do anything. it was written with the intent that the subsequent-line check that rejects strings consisting of anything but [[:alnum:].-] be removed and proper IDN translation support added. That's still a future project. There was a partial patch submitted but it's missing a fair amount of the functionality needed to meet standards/expected behavior. > 1507ebf83733 + f22a9edaf8a6 made mbstowcs never returns -1 when > CURRENT_UTF8 is false even though is_valid_hostname should probably not > be local dependant. The intent was that it be locale-dependent, in that malformed UTF-8 is rejected right away as long as a real (non-"C") locale is set, and that all high bytes are rejected later in the C locale since they don't correspond to any characters. I'm not entirely sure this is necessary for our eventual IDN implementation. There's no rule that says it couldn't interpret UTF-8 names even when the byte sequences aren't being interpreted as making up characters otherwise. But in a context where someone has explicitly asked for the byte-based locale, it's probably preferable that they don't get back results they can't interpret as characters, but instead get back the raw punycode. In any case, I don't think any of this matters until we get to really doing IDN. The code isn't doing anything right now, but changing it based on expectations of how IDN will later work without actually having worked that out seems like pointless churn. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.