john-dev - Re: john scalability

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20110329194239.GB17545@openwall.com>
Date: Tue, 29 Mar 2011 23:42:39 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: john scalability

On Tue, Mar 29, 2011 at 08:45:35PM +0200, magnum wrote:
> On 2011-03-29 18:22, Solar Designer wrote:
> >A related concern, though, is that the self-tests only check that the
> >values are in range - they don't check for full usage of the range, nor
> >for uniform distribution.  So it is easy to have a bug where a large
> >hash table would be allocated, but only a subset of its hash buckets
> >would ever be in use.
> 
> True, I have not tested all formats with real hashes and debug code. I 
> would like to dump distribution statistics after loading, but I'm not 
> sure how to accomplish that quick and easy. That would be a very nice 
> "#ifdef'ed patch" though, to use for future testing. It could also be 
> used to ensure that the current formats, even without the larger sizes, 
> do the right thing. It would also be useful if trying to implement these 
> functions from scratch in all the formats lacking it.
> 
> Even if we had such debug code in place, I lack sufficiently sized input 
> files for many of the formats.
> 
> I did however take a brief look at all of the formats confirming it 
> *seemed* to make sense just adding the wider bitmasks.

We're on the same page.

I'll consider adding some debugging code for the hash functions.  In my
own testing, I was adding a printf() into the self-tests to see that the
values looked sane, although obviously the number of test vectors is not
large enough to have all possible values generated in this way.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.