john-dev - Re: bechmark versus self-test

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150408125658.GA26688@openwall.com>
Date: Wed, 8 Apr 2015 15:56:58 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bechmark versus self-test

Claudio, magnum -

On Sun, Apr 05, 2015 at 07:48:19PM -0300, Claudio Andr? wrote:
> JtR should allow the format to set which ciphers in test[] should be 
> used to benchmark. Reason:
> - work only with passwords of the same size/cost/whatever.
> 
> https://github.com/magnumripper/JohnTheRipper/issues/1182

IIRC, during benchmarking we currently use only the first two test
vectors' salt strings (we alternate them for the "many salts" benchmark),
but potentially with all of the test vectors' plaintexts.

So the requirement to "work only with passwords of the same [...]
cost/whatever" is currently addressed by making those the first two in
tests[].  As to "size", if it means plaintext password length, then yes,
it appears we currently lack the ability to lock it to a fixed value.

We do have the ability to benchmark separately for passwords shorter
than some length and longer than it, but only when we're not also
benchmarking for one vs. many salts - see the logic around
params.benchmark_length in bench_set_keys().  I initially introduced
this for the AFS format, which differs greatly for length up to 8 vs.
longer (with longer being much faster).

While we could add a length lock feature, I think it doesn't fully solve
the problem with benchmarking some of the recent formats, especially the
PHC finalists where caching potentially plays a great role.  On a GPU,
we should be benchmarking with tens of thousands of _different_ test
vectors, not just repeat a few same suitable-length test vectors over
and over.

> A POC can be seen here:
> https://github.com/claudioandre/JohnTheRipper/commit/cd2f01e7263f6bfbb8017767c59a6877923765a1

I have mixed feelings about this.  If we introduce an extra field to
each test vector, it better be a flags field where we can add more flags
later if we need to.

Also, I am not sure if the proposed mask* fields address the caching
issue I mentioned above or not.  We need to address this issue even for
slow hashes where it doesn't make sense for the format to bother
providing its own mask mode support.  Will this mask be used with our
generic mask support code in such cases?

Thanks,

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.