|
|
Message-ID: <20260323155700.GF1827@brightrain.aerifal.cx>
Date: Mon, 23 Mar 2026 11:57:01 -0400
From: Rich Felker <dalias@...c.org>
To: Simon Resch <simon.resch@...e-intelligence.com>
Cc: musl@...ts.openwall.com
Subject: Re: regexec infinite loop on self-referential backreference
pattern
On Mon, Mar 23, 2026 at 03:15:34PM +0100, Simon Resch wrote:
> Hi,
>
> I've found that musl's regexec enters an infinite loop on the
> self-referential backreference pattern \(^.*\1\)*, even with a two-byte
> input. Since the input is only two bytes, this is not exponential blowup
> but appears to be a true infinite loop in the backtracking matcher.
>
> The same bug exists in upstream TRE, from which musl's regex implementation
> is derived. For comparison, glibc rejects this pattern at regcomp time with
> REG_ESUBREG ("Invalid back reference"), since \1 references a group that
> contains itself.
>
> Reproducer
>
> #include <regex.h>
> #include <stdio.h>
>
> int main(void) {
> regex_t re;
>
> const char pattern[] = "\\(^.*\\1\\)*";
> const char subject[] = "aa";
>
> int rc = regcomp(&re, pattern, 0);
> if (rc != 0) {
> char errbuf[256];
> regerror(rc, &re, errbuf, sizeof(errbuf));
> fprintf(stderr, "regcomp failed: %s\n", errbuf);
> return 1;
> }
>
> printf("regcomp succeeded, calling regexec ...\n");
> fflush(stdout);
>
> regexec(&re, subject, 0, NULL, 0);
>
> regfree(&re);
> return 0;
> }
>
> To verify on Alpine (musl 1.2.5) place the above content into a file
> reproducer.c and execute
>
> docker run --rm -v $PWD/reproducer.c:/reproducer.c alpine sh -c \
> 'apk add gcc musl-dev && gcc -o /reproducer /reproducer.c &&
> timeout 10 /reproducer; echo "Exit code: $?"'
>
> Output:
>
> regcomp succeeded, calling regexec ...
> Exit code: 143
>
> The process is killed by timeout after 10 seconds. I have also let it run
> for over an hour with no termination.
>
> Any program that passes user-supplied patterns to regcomp/regexec (grep,
> sed, awk, web servers with regex-based URL routing, input validators, etc.)
> can be made to hang a thread/process permanently with a short pattern and a
> trivial input.
>
> Suggested fix: Either reject self-referential backreferences during regcomp
> (as glibc does), or add a step limit to the backtracking matcher that
> returns REG_ESPACE when exceeded.
>
> Please CC me on replies.
Thanks. It should be caught during regcomp. I'm not sure yet how to do
that, but that's the approach a fix should take.
Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.