musl - Re: buffer overflow in regcomp and a way to find more of those

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150321013016.GS23507@brightrain.aerifal.cx>
Date: Fri, 20 Mar 2015 21:30:16 -0400
From: Rich Felker <dalias@...c.org>
To: Konstantin Serebryany <konstantin.s.serebryany@...il.com>,
	musl@...ts.openwall.com
Subject: Re: buffer overflow in regcomp and a way to find more of those

On Sat, Mar 21, 2015 at 02:23:41AM +0100, Szabolcs Nagy wrote:
> * Konstantin Serebryany <konstantin.s.serebryany@...il.com> [2015-03-20 18:10:18 -0700]:
> > After your fix the fuzzer did not find anything else so far, but it
> > suffers from slow performance on some cases.
> > Not sure if this qualifies for a bug, but the following example takes
> > ~2 seconds to run (runs instantly with glibc):
> 
> i think the problem is stacked repetitions
> tre doesnt handle them in a sane way
> and uses huge amount of ram

Sadly there doesn't seem to be any sane way to handle them...

> for * it would be easy to solve, but
> the general case is theoretically impossible to
> solve: x{255}{255} will be a 255*255 state machine
> 
> this is the only part in the musl regex
> engine that's allowed to have super linear
> space/time complexity
> 
> (you might want to add some logic to avoid
> such stacked repetitions to speed up the search)
> 
> (btw the standard does not allow these, but if
> the pattern is parenthesized around every repetition
> then that's ok: (x*)* is a valid pattern, x** is not,
> so there is not much point rejecting these patterns
> the problem does not go away since grouping is allowed)
> 
> > int main() {
> >   regex_t preg;
> >   const char *s = ".****\\Z$<\\0)_";

Isn't the \0 an invalid backreference? Could it be getting processed
in a way that's causing the slowdown, but simply rejected by glibc?

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.