Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260323215455.GU3520958@port70.net>
Date: Mon, 23 Mar 2026 22:54:55 +0100
From: Szabolcs Nagy <nsz@...t70.net>
To: Rich Felker <dalias@...c.org>
Cc: Simon Resch <simon.resch@...e-intelligence.com>,
	musl@...ts.openwall.com
Subject: Re: regexec infinite loop on self-referential backreference
 pattern

* Rich Felker <dalias@...c.org> [2026-03-23 11:57:01 -0400]:
> On Mon, Mar 23, 2026 at 03:15:34PM +0100, Simon Resch wrote:
> > Hi,
> > 
> > I've found that musl's regexec enters an infinite loop on the
> > self-referential backreference pattern \(^.*\1\)*, even with a two-byte
> > input. Since the input is only two bytes, this is not exponential blowup
> > but appears to be a true infinite loop in the backtracking matcher.
> > 
> > The same bug exists in upstream TRE, from which musl's regex implementation
> > is derived. For comparison, glibc rejects this pattern at regcomp time with
> > REG_ESUBREG ("Invalid back reference"), since \1 references a group that
> > contains itself.
> > 
> > Reproducer
> > 
> >   #include <regex.h>
> >   #include <stdio.h>
> > 
> >   int main(void) {
> >       regex_t re;
> > 
> >       const char pattern[] = "\\(^.*\\1\\)*";
> >       const char subject[] = "aa";
> > 
> >       int rc = regcomp(&re, pattern, 0);
> >       if (rc != 0) {
> >           char errbuf[256];
> >           regerror(rc, &re, errbuf, sizeof(errbuf));
> >           fprintf(stderr, "regcomp failed: %s\n", errbuf);
> >           return 1;
> >       }
> > 
> >       printf("regcomp succeeded, calling regexec ...\n");
> >       fflush(stdout);
> > 
> >       regexec(&re, subject, 0, NULL, 0);
> > 
> >       regfree(&re);
> >       return 0;
> >   }
> > 
> > To verify on Alpine (musl 1.2.5) place the above content into a file
> > reproducer.c and execute
> > 
> >   docker run --rm -v $PWD/reproducer.c:/reproducer.c alpine sh -c \
> >           'apk add gcc musl-dev && gcc -o /reproducer /reproducer.c &&
> > timeout 10 /reproducer; echo "Exit code: $?"'
> > 
> > Output:
> > 
> >   regcomp succeeded, calling regexec ...
> >   Exit code: 143
> > 
> > The process is killed by timeout after 10 seconds. I have also let it run
> > for over an hour with no termination.
> > 
> > Any program that passes user-supplied patterns to regcomp/regexec (grep,
> > sed, awk, web servers with regex-based URL routing, input validators, etc.)
> > can be made to hang a thread/process permanently with a short pattern and a
> > trivial input.

note that not all those tools use libc regex
(busybox sed,grep,awk do but e.g. awk uses ERE, not BRE)

> > 
> > Suggested fix: Either reject self-referential backreferences during regcomp
> > (as glibc does), or add a step limit to the backtracking matcher that
> > returns REG_ESPACE when exceeded.
> > 
> > Please CC me on replies.
> 
> Thanks. It should be caught during regcomp. I'm not sure yet how to do
> that, but that's the approach a fix should take.
> 
> Rich

can be fixed by adding a bit of internal state.
attached a patch.


View attachment "0001-regex-reject-invalid-digit-back-reference-in-BRE.patch" of type "text/x-diff" (2345 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.