Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 16 Jul 2013 22:38:11 -0700
From: Isaac <idunham@...abit.com>
To: musl@...ts.openwall.com
Subject: Re: regex libs (was Re: embedded newbies site.)

On Tue, Jul 16, 2013 at 11:32:20AM -0400, Rich Felker wrote:
> On Tue, Jul 16, 2013 at 11:10:34AM -0400, LM wrote:
> > On Tue, Jul 16, 2013 at 10:00 AM, Rich Felker <dalias@...ifal.cx> wrote:
> > 
> > > The whole concept of regular expressions is that they're regular,
> > > meaning they're matchable in O(n) time with O(1) space. PCRE (the
> > > implementation) uses backtracking for everything, giving it
> > > exponentially-bad performance (JIT cannot fix this), and PCRE (the
> > > language) has a lot of features that are fundamentally not regular and
> > > thus can't be implemented efficiently. Also, the behavior of some of
> > > the features (e.g. greedy vs non-greedy matching) were not designed
> > > intentionally but just arose out of the backtracking implementation,
> > > and thus don't make a lot of sense unless you think from the
> > > standpoint of such an implementation.
> > >
> > 
> > Went back and rechecked the documentation (
> > http://www.pcre.org/readme.txt).  You're both right, PCRE is offering
> > the Perl regular expressions
> > implementation even when one uses the pcreposix interface.  Would have been
> > nice if they offered actual regular expressions handling if you only want
> > to use the POSIX compatible part of the interface.

The pcreposix stuff is only for those who want perl regexes with an existing
POSIX codebase.

> > So what are some good regex library solutions?  I'm also wondering if there
> 
> POSIX systems provide the regcomp and regexec api as part of libc. On
> musl, glibc, uclibc, and (afaik) all the bsd libcs, this gives you a
> good implementation. However musl's is the only one I know of with no
> conformance bugs, but the conformance bugs in the others are isolated
> to obscure subexpression match issues.
> 
> > are some good cross-platform portable library solutions (or if PCRE is the
> > best pattern matching solution from a portability standpoint even if it's
> > not strictly regex compatible).
> 
> TRE, on which musl's is based, is the closest to being good, in my
> opinion. It has optimal big-O but fairly suboptimal constant. However
> it's unmaintained and has at least two bugs that aren't fixed
> upstream, one of which may be exploitable if an untrusted party can
> supply arbitrary regular expressions, and the other which just causes
> some POSIX BRE expressions (the legacy style used by sed and non-e
> grep) to be misinterpreted.

Have these been mentioned to NetBSD? I ask because they use TRE for their libc
(the relicense happened at their request).

Thanks,
Isaac Dunham

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.