Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20260323155700.GF1827@brightrain.aerifal.cx>
Date: Mon, 23 Mar 2026 11:57:01 -0400
From: Rich Felker <dalias@...c.org>
To: Simon Resch <simon.resch@...e-intelligence.com>
Cc: musl@...ts.openwall.com
Subject: Re: regexec infinite loop on self-referential backreference
 pattern

On Mon, Mar 23, 2026 at 03:15:34PM +0100, Simon Resch wrote:
> Hi,
> 
> I've found that musl's regexec enters an infinite loop on the
> self-referential backreference pattern \(^.*\1\)*, even with a two-byte
> input. Since the input is only two bytes, this is not exponential blowup
> but appears to be a true infinite loop in the backtracking matcher.
> 
> The same bug exists in upstream TRE, from which musl's regex implementation
> is derived. For comparison, glibc rejects this pattern at regcomp time with
> REG_ESUBREG ("Invalid back reference"), since \1 references a group that
> contains itself.
> 
> Reproducer
> 
>   #include <regex.h>
>   #include <stdio.h>
> 
>   int main(void) {
>       regex_t re;
> 
>       const char pattern[] = "\\(^.*\\1\\)*";
>       const char subject[] = "aa";
> 
>       int rc = regcomp(&re, pattern, 0);
>       if (rc != 0) {
>           char errbuf[256];
>           regerror(rc, &re, errbuf, sizeof(errbuf));
>           fprintf(stderr, "regcomp failed: %s\n", errbuf);
>           return 1;
>       }
> 
>       printf("regcomp succeeded, calling regexec ...\n");
>       fflush(stdout);
> 
>       regexec(&re, subject, 0, NULL, 0);
> 
>       regfree(&re);
>       return 0;
>   }
> 
> To verify on Alpine (musl 1.2.5) place the above content into a file
> reproducer.c and execute
> 
>   docker run --rm -v $PWD/reproducer.c:/reproducer.c alpine sh -c \
>           'apk add gcc musl-dev && gcc -o /reproducer /reproducer.c &&
> timeout 10 /reproducer; echo "Exit code: $?"'
> 
> Output:
> 
>   regcomp succeeded, calling regexec ...
>   Exit code: 143
> 
> The process is killed by timeout after 10 seconds. I have also let it run
> for over an hour with no termination.
> 
> Any program that passes user-supplied patterns to regcomp/regexec (grep,
> sed, awk, web servers with regex-based URL routing, input validators, etc.)
> can be made to hang a thread/process permanently with a short pattern and a
> trivial input.
> 
> Suggested fix: Either reject self-referential backreferences during regcomp
> (as glibc does), or add a step limit to the backtracking matcher that
> returns REG_ESPACE when exceeded.
> 
> Please CC me on replies.

Thanks. It should be caught during regcomp. I'm not sure yet how to do
that, but that's the approach a fix should take.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.