![]() |
|
Message-Id: <20251004020906.1464237-1-musl@raf.org> Date: Sat, 4 Oct 2025 12:09:05 +1000 From: raf <musl@....org> To: musl@...ts.openwall.com Subject: [PATCH] fnmatch: Make ? match binary/non-character byte (like * does) Hi, It was recently brought to my attention that fnmatch() on Linux/glibc is buggy, and that ?? matches both éé (as expected), but also é (which is wrong), because it performs two matches, first using multibyte characters in the current (UTF-8) locale, then again as single bytes. Ideally, it would perform a single match that matched a valid character if present, or an "invalid" byte otherwise. With further testing, I found that fnmatch() on OpenBSD and NetBSD is also buggy (it seems to only interpret the target data as single-byte characters despite the current user's locale's charmap). Luckily, fnmatch() on FreeBSD, Solaris, macOS (mostly), and Cygwin are fine. Anyway, I have a program (rawhide) that makes heavy use of globbing, so I thought it best to include an fnmatch() implementation that didn't have these bugs, so I'm including musl's fnmatch() implementation in my program. Many thanks! Love your work! But I found that musl's implementation is also buggy. ? doesn't match a byte that doesn't represent a correctly encoded character in the current user's locale's charmap. The problem is that the fnmatch() implementation assumes that ? should only match a validly-encoded character, but the reality is, that on most systems, file names are not required to be text at all. They don't need to be composed of characters (let alone characters encoded according to the current user's locale). File names can be binary nonsense (except on macOS and maybe Windows), or they can be characters encoded according to some other user's locale. And since fnmatch() is supposed to match filenames (hence the "fn" in the name "fnmatch"), it shouldn't be assuming validly-encoded characters. But I admit that I'm not sure what the POSIX standard says on the matter. However, in order for fnmatch() to be more useful, it needs to be able to match filenames which aren't composed of characters in the current user's locale. So here's a tiny patch (two lines changed) that make the musl fnmatch() implementation work correctly with rubbish/binary/non-character filenames. Actually, it's only the first line that matters. It makes fnmatch() work. The second line is just for politeness. It just prevents the invalid character (-1) from being passed to the casefold() function. I happily assign any copyright to Rich Felker if that matters. I couldn't find any tests to add to for this change. But the tests for my software show that this change works as I expect it to. cheers, raf
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.