|
|
Message-ID: <20260323215455.GU3520958@port70.net>
Date: Mon, 23 Mar 2026 22:54:55 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: Rich Felker <dalias@...c.org>
Cc: Simon Resch <simon.resch@...e-intelligence.com>,
musl@...ts.openwall.com
Subject: Re: regexec infinite loop on self-referential backreference
pattern
* Rich Felker <dalias@...c.org> [2026-03-23 11:57:01 -0400]:
> On Mon, Mar 23, 2026 at 03:15:34PM +0100, Simon Resch wrote:
> > Hi,
> >
> > I've found that musl's regexec enters an infinite loop on the
> > self-referential backreference pattern \(^.*\1\)*, even with a two-byte
> > input. Since the input is only two bytes, this is not exponential blowup
> > but appears to be a true infinite loop in the backtracking matcher.
> >
> > The same bug exists in upstream TRE, from which musl's regex implementation
> > is derived. For comparison, glibc rejects this pattern at regcomp time with
> > REG_ESUBREG ("Invalid back reference"), since \1 references a group that
> > contains itself.
> >
> > Reproducer
> >
> > #include <regex.h>
> > #include <stdio.h>
> >
> > int main(void) {
> > regex_t re;
> >
> > const char pattern[] = "\\(^.*\\1\\)*";
> > const char subject[] = "aa";
> >
> > int rc = regcomp(&re, pattern, 0);
> > if (rc != 0) {
> > char errbuf[256];
> > regerror(rc, &re, errbuf, sizeof(errbuf));
> > fprintf(stderr, "regcomp failed: %s\n", errbuf);
> > return 1;
> > }
> >
> > printf("regcomp succeeded, calling regexec ...\n");
> > fflush(stdout);
> >
> > regexec(&re, subject, 0, NULL, 0);
> >
> > regfree(&re);
> > return 0;
> > }
> >
> > To verify on Alpine (musl 1.2.5) place the above content into a file
> > reproducer.c and execute
> >
> > docker run --rm -v $PWD/reproducer.c:/reproducer.c alpine sh -c \
> > 'apk add gcc musl-dev && gcc -o /reproducer /reproducer.c &&
> > timeout 10 /reproducer; echo "Exit code: $?"'
> >
> > Output:
> >
> > regcomp succeeded, calling regexec ...
> > Exit code: 143
> >
> > The process is killed by timeout after 10 seconds. I have also let it run
> > for over an hour with no termination.
> >
> > Any program that passes user-supplied patterns to regcomp/regexec (grep,
> > sed, awk, web servers with regex-based URL routing, input validators, etc.)
> > can be made to hang a thread/process permanently with a short pattern and a
> > trivial input.
note that not all those tools use libc regex
(busybox sed,grep,awk do but e.g. awk uses ERE, not BRE)
> >
> > Suggested fix: Either reject self-referential backreferences during regcomp
> > (as glibc does), or add a step limit to the backtracking matcher that
> > returns REG_ESPACE when exceeded.
> >
> > Please CC me on replies.
>
> Thanks. It should be caught during regcomp. I'm not sure yet how to do
> that, but that's the approach a fix should take.
>
> Rich
can be fixed by adding a bit of internal state.
attached a patch.
View attachment "0001-regex-reject-invalid-digit-back-reference-in-BRE.patch" of type "text/x-diff" (2345 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.