john-users - Re: Fuzzing with regular expressions

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHv4kXjeuiHNGGjLwSXooq3ZXkdrTKyi=WHSOUuJf5RjTc70yQ@mail.gmail.com>
Date: Wed, 22 May 2013 12:40:12 +0200
From: Jan Starke <jan.starke@...ofbed.org>
To: john-users@...ts.openwall.com
Subject: Re: Fuzzing with regular expressions

Hi,

rexgen currently cannot use Unicode strings as input, due to limitations of
the lexer (GNU flex). flex ignores any characters which are not known to
it. If you want to generate unicode characters, you must specify them with
the \uxxxx syntax, e.g.

rexgen 'M(ue|oe|\u00fc|\u00f6)ller'

There seems to be a bug in class expressions, so that [\u00fc\u00f6] don't
work; I created a new issue for that and will fix this.

The aim of the options u8, u16 and u32 are to enforce the output encoding.
To verify this, you could create a hexdump of the output:

rexgen 'test' | od -x

The option UTF_VARIANT does not have any effect and will be removed. I
initially used this to select the internally used character encoding. Now
this can be selected with the command line switch and is not necessary
anymore.

Regards, Jan


2013/5/22 magnum <john.magnum@...hmail.com>

> I do not quite understand the section about Unicode. And it does not seem
> to work (my terminal is UTF-8):
>
> $ rexgen "M[üö]ller"
> Mller
> Mller
> Mller
> $ rexgen -u8 n "M[üö]ller"
> Mller
> Mller
> Mller
>
> -DUTF_VARIANT=8 does not change the above, in case it was supposed to.
>
> magnum
>
>
> On 22 May, 2013, at 7:37 , Jan Starke <jan.starke@...ofbed.org> wrote:
>
> > Magnum,
> >
> > you're right. I quickly updated the online documentation (btw, running
> > rexgen without parameters gives you a documentation, too). Maybe I should
> > support something like -h or --help.
> >
> > I also fixed the problem with quantifiers and references, but only on my
> > small notebook. I will commit the changes when I'm at home, so that since
> > tomorrow there should be no known bugs anymore :-)
> >
> > Regards, Jan
> >
> >
> > 2013/5/21 magnum <john.magnum@...hmail.com>
> >
> >> On 21 May, 2013, at 22:59 , Jan Starke <jan.starke@...ofbed.org> wrote:
> >>> i've added the requested feature. rexgen is becoming a very nice tool
> >> with
> >>> this one, so thank you for your thoughts and ideas so far
> >>
> >> Excellent. It still builds on OSX and you seem to have fixed the other
> >> issues (like .dylib vs .so): I had a private hard-coded patch that I no
> >> longer need to apply.
> >>
> >>> It is working, so one can test it now. But please be aware this feature
> >> is
> >>> alpha level only: using back references and pipe references together
> with
> >>> quantifiers (something like ([0-9])abcd\1{2,3}) results in a segfault.
> >> This
> >>> is my next task for now.
> >>>
> >>> I kind of documented the new feature on
> http://code.google.com/p/rexgen/
> >>
> >> I think you should also add the -f option to the "Which parameters are
> >> supported?" section on that page.
> >>
> >> Thanks!
> >> magnum
> >>
> >>
> >>> 2013/4/20 magnum <john.magnum@...hmail.com>
> >>>
> >>>> The suggestion I mentioned is not on this list but in your "issues":
> >>>> http://code.google.com/p/rexgen/issues/detail?id=5
> >>>>
> >>>> magnum
> >>>>
> >>>>
> >>>> On 19 Apr, 2013, at 22:55 , Jan Starke <jan.starke@...ofbed.org>
> wrote:
> >>>>
> >>>>> Hi
> >>>>>
> >>>>> yeah, there should be a simple way of creating a C (without ++)
> >>>> interface.
> >>>>>
> >>>>> Unfortunately, I have some problems reading full email threads. I
> must
> >>>> work
> >>>>> on this. If I understand you right, you want to combine another
> >> wordlist
> >>>>> generator with rexgen, e.g. to extend simple wordlists, like this:
> >>>>>
> >>>>> cat wordlist.txt | rexgen 're1<pipeinput>re2' | ...
> >>>>>
> >>>>> I still had a similar idea, because we sometimes could need something
> >>>> like
> >>>>> this. I still have some work to do on the current features, but this
> >> will
> >>>>> be the next feature.
> >>>>>
> >>>>> Kind regards, jan
> >>>>>
> >>>>>
> >>>>> 2013/4/16 magnum <john.magnum@...hmail.com>
> >>>>>
> >>>>>> On 16 Apr, 2013, at 22:17 , Jan Starke <jan.starke@...ofbed.org>
> >> wrote:
> >>>>>>> I just changed some things and was able to speed up rexgen by the
> >>>>>>> factor of 5 (on my system) without using threads; additionally the
> >>>>>>> ordering of the values is partly random. Maybe you want to give it
> a
> >>>>>>> try...
> >>>>>>
> >>>>>> I am delighted to report that under OSX (built with gcc/g++) r44 is
> >> 11.5
> >>>>>> times faster than the last version I tried (which was r24 or so).
> >>>> Previous
> >>>>>> speed about 2.3MB/s (405K words/s) and now over 27 MB/s (4.6M
> >> words/s),
> >>>>>> using '[a-z]{0,5}'. This is still a bottleneck for very fast formats
> >>>> but,
> >>>>>> well, any way of producing candidates is and with the finer
> >> granularity
> >>>> of
> >>>>>> a regexp you might gain total time anyway.
> >>>>>>
> >>>>>>> BTW, we've been able to crack a bunch of passwords during a pentest
> >>>>>>> with rexgen and JtR, because we had an idea about how the passwords
> >>>>>>> could look like and we could describe this using a simple regex :-)
> >>>>>>
> >>>>>>
> >>>>>> Yes, for some patterns (with variable length parts like
> >>>> "abc[0-9]{1,3}def"
> >>>>>> there's just no way to do it (that easily) with any other tool I
> know
> >>>> of.
> >>>>>> Not to mention wilder regexps and back references!
> >>>>>>
> >>>>>> Like I just wrote in another post I'd love to have this as a native
> >> mode
> >>>>>> in JtR but we can't use C++. OTOH, maybe we can add a HAVE_REXGEN in
> >>>>>> Makefile, stating that we have librexgen installed, and write a mode
> >> in
> >>>> C
> >>>>>> that just calls the lib.
> >>>>>>
> >>>>>> BTW did you see my suggestion of supporting append/prepend to words
> >> read
> >>>>>> from stdin? That would be awesome.
> >>>>>>
> >>>>>> magnum
> >>>>>>
> >>>>
> >>>>
> >>>>
> >>
> >>
> >>
>
>
>
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.