john-dev - Re: 7z valid()

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e889a01744edc407e464781f24e9fb1e@smtp.hushmail.com>
Date: Wed, 09 Oct 2013 13:40:45 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: 7z valid()

On 2013-10-09 11:51, Dhiru Kholia wrote:
> On 10/09/13 at 12:00am, magnum wrote:
>> 7z is nowhere near Dhiru's high score in "bwtdt" from two months ago though.
>
> It is a work of art, my friend ;(
>
>> I'll spare you the details, although it was so bad I found it funny. It did
>> waste some of my time though, again.
>
> My bad. I usually try to fix them all on a boring day.

The problem with bwtdt was that it passed ANYTHING (well, except "*") as 
valid so most any input autodetected as bwtdt and then segfaulted in its 
binary() or salt(). That's nothing you can postpone - you just can't 
commit such things.


>> Maybe I should enable the "valid killer self-test" by default, even though
>> it usually segfaults instead of fails with a clue. Heck, I could just throw
>> any segfaulting format into unused without further notice, for Dhiru to fix.
>> But even that will waste a lot of MY time and not Dhiru's. Just as compiling
>> the above list did. And just as opening 10 GitHub issues will.
>
> Please turn on this ""valid killer self-test" (if we actually have such a
> thing).

It was active in DEBUG builds. I tried it now but it doesn't seem to 
work anymore. It did not segfault nor catch any of the bad ones. Not 
sure why - it used to work well.

I changed it a bit and caught a bunch of formats. The openssl-enc now 
fails self-test but is left in the tree. The following formats segfaults 
so are moved to unused: Haval, MD2, keyring-opencl, Panama, RipeMD, 
Skein, Snefru and Tiger. Coincidentally, all have the same author ;-)

Hint: You can't rely on just checking string lengths. If a field is 
supposed to be hex, you must check it for *being* hex. And so on (this 
obviously includes base 10 numbers that you later read with eg. atoi() - 
they must be numbers, and they must be within proper range).

Hint 2: You start every new format by copying an old one - which is 
natural. Concentrate on ONE valid() and try to make it as simple and 
beautyful as possible yet absolutely unbreakable - look at it as an 
exercise, pretend you are writing an Internet daemon running as root, 
that will be attacked by millions of script kiddies with fuzzers. I'm 
sure you can make it better than the best one I've come up with so far. 
Do it *really* good for a single format and then base all future formats 
on it. Next time hopefully you can lecture me ;-)

Look at how RAKP now checks for actual hex digits in the fields, and how 
AIX checks for Base64. That is (hopefully o.O) foolproof yet it's simple 
and readable code. Also, if eg. only lower-case hex digits are allowed, 
that is easily accounted for just by changing the HEXCHARS macro 
accordingly. BTW, note how these formats also avoid using strtok() nor 
allocate any temporary buffer.

This self-test (right now enabled regardless of DEBUG) is temporary, 
it's a hack we can't leave active in a release. And it might very well 
seg-fault on other formats not caught on my gear right now. Also, it 
obviously does not catch all cases of sloppy checks.

magnum
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.