Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 20 Mar 2015 21:30:16 -0400
From: Rich Felker <dalias@...c.org>
To: Konstantin Serebryany <konstantin.s.serebryany@...il.com>,
	musl@...ts.openwall.com
Subject: Re: buffer overflow in regcomp and a way to find more of those

On Sat, Mar 21, 2015 at 02:23:41AM +0100, Szabolcs Nagy wrote:
> * Konstantin Serebryany <konstantin.s.serebryany@...il.com> [2015-03-20 18:10:18 -0700]:
> > After your fix the fuzzer did not find anything else so far, but it
> > suffers from slow performance on some cases.
> > Not sure if this qualifies for a bug, but the following example takes
> > ~2 seconds to run (runs instantly with glibc):
> 
> i think the problem is stacked repetitions
> tre doesnt handle them in a sane way
> and uses huge amount of ram

Sadly there doesn't seem to be any sane way to handle them...

> for * it would be easy to solve, but
> the general case is theoretically impossible to
> solve: x{255}{255} will be a 255*255 state machine
> 
> this is the only part in the musl regex
> engine that's allowed to have super linear
> space/time complexity
> 
> (you might want to add some logic to avoid
> such stacked repetitions to speed up the search)
> 
> (btw the standard does not allow these, but if
> the pattern is parenthesized around every repetition
> then that's ok: (x*)* is a valid pattern, x** is not,
> so there is not much point rejecting these patterns
> the problem does not go away since grouping is allowed)
> 
> > int main() {
> >   regex_t preg;
> >   const char *s = ".****\\Z$<\\0)_";

Isn't the \0 an invalid backreference? Could it be getting processed
in a way that's causing the slowdown, but simply rejected by glibc?

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.