Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-Id: <20251004020906.1464237-1-musl@raf.org>
Date: Sat,  4 Oct 2025 12:09:05 +1000
From: raf <musl@....org>
To: musl@...ts.openwall.com
Subject: [PATCH] fnmatch: Make ? match binary/non-character byte (like * does)

Hi,

It was recently brought to my attention that fnmatch()
on Linux/glibc is buggy, and that ?? matches both éé
(as expected), but also é (which is wrong), because it
performs two matches, first using multibyte characters
in the current (UTF-8) locale, then again as single
bytes. Ideally, it would perform a single match that
matched a valid character if present, or an "invalid"
byte otherwise. With further testing, I found that
fnmatch() on OpenBSD and NetBSD is also buggy (it seems
to only interpret the target data as single-byte
characters despite the current user's locale's
charmap).

Luckily, fnmatch() on FreeBSD, Solaris, macOS (mostly),
and Cygwin are fine.

Anyway, I have a program (rawhide) that makes heavy use
of globbing, so I thought it best to include an
fnmatch() implementation that didn't have these bugs,
so I'm including musl's fnmatch() implementation in my
program. Many thanks! Love your work!

But I found that musl's implementation is also buggy. ?
doesn't match a byte that doesn't represent a correctly
encoded character in the current user's locale's
charmap.

The problem is that the fnmatch() implementation
assumes that ? should only match a validly-encoded
character, but the reality is, that on most systems,
file names are not required to be text at all. They
don't need to be composed of characters (let alone
characters encoded according to the current user's
locale). File names can be binary nonsense (except on
macOS and maybe Windows), or they can be characters
encoded according to some other user's locale.

And since fnmatch() is supposed to match filenames
(hence the "fn" in the name "fnmatch"), it shouldn't be
assuming validly-encoded characters. But I admit that
I'm not sure what the POSIX standard says on the
matter. However, in order for fnmatch() to be more
useful, it needs to be able to match filenames which
aren't composed of characters in the current user's
locale.

So here's a tiny patch (two lines changed) that make
the musl fnmatch() implementation work correctly with
rubbish/binary/non-character filenames. Actually, it's
only the first line that matters. It makes fnmatch()
work. The second line is just for politeness. It just
prevents the invalid character (-1) from being passed
to the casefold() function.

I happily assign any copyright to Rich Felker if that
matters.

I couldn't find any tests to add to for this change.
But the tests for my software show that this change
works as I expect it to.

cheers,
raf


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.