john-dev - Re: best approach to cmp_all and cmp

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <4F1EC814.4000904@linuxasylum.net>
Date: Tue, 24 Jan 2012 16:02:44 +0100
From: Samuele Giovanni Tonon <samu@...uxasylum.net>
To: john-dev@...ts.openwall.com
Subject: Re: best approach to cmp_all and cmp_one

On 01/23/12 22:45, Solar Designer wrote:
> Hi Samuele,
> 
> On Mon, Jan 23, 2012 at 10:03:55PM +0100, Samuele Giovanni Tonon wrote:
>> sorry to bring back another question about cmp_all and cmp_one.
> 
> Why, the question is entirely appropriate and your progress at improving
> your code is desirable.  There's nothing for you to be sorry for.
> 

>> however: if cmp_all get another call before a crypt_all i will just read
>> garbage; would be good on having a global semaphore to avoid a
>> clEnqueReadBuffer if there's hasn't been any previous crypt_all or there
>> are more elegant ways to solve the problem ?
> 
> Having a global flag (not a semaphore) is in fact what I'd recommend.
> You may call it, say, have_full_hashes.  Set it to 0 in crypt_all()
> (because you only read 1/5 - 32 bits out of 160 bits per hash, right?)

> In cmp_all(), you just use your partial hashes - cmp_all()'s return
> value does not need to be precise (non-zero means that there _might_ be
> matches).  You do not need to access this flag there, nor to retrieve
> the full hashes.
> 
> Ditto for cmp_one().
> 
> However, in cmp_exact() you do need to check the flag and if it's still
> 0, then retrieve the full hashes from the GPU (all of them at once) and
> set the flag.  This means that subsequent calls to cmp_exact(), if any,
> will not need to retrieve anything.
> 
>> second: i was thinking on getting the i-th hashes instead of the whole
>> buffer but i discovered cmp_one goes incremental, let me explain by an
>> example
>>
>> suppose NUM_KEYS is 1000 and cmp_all(1000) is called and
>> b == outbuffer[i] is TRUE for 995, then cmp_one will be called like that:
>> cmp_one(0)
>> cmp_one(1)
>> .....
>> cmp_one(995) and password is found.
>> wouldn't be best to have cmp_one start at 995 and incrementing until
>> NUM_KEYS is reached instead of starting from the beginning or to do that
>> there will be side effects or a major rewrite of the code ?
> 
> There should be no need to do that.  cmp_all() should give false
> positives infrequently enough that we should be able to afford doing
> things non-optimally in that case.  In case cmp_all()'s false positives
> are too frequent, you should instead increase the size of your partial
> hashes.  For example, if your max_keys_per_crypt is 0x10000, you load
> 0x10000 of same-salt hashes for cracking, and your partial hashes are
> 32-bit, then false positives become very frequent (not cmp_all()'s, but
> get_hash*() and cmp_one()'s, though - since cmp_all() is not used on
> this many loaded hashes).  Since your cmp_exact() is very expensive,
> you should deal with this by increasing the partial hash size - e.g., to
> 48 bits.
> 
> Sure, increasing the partial hash size may have a performance penalty of
> its own in your case.  This is normally not the case for CPU-only code,
> where larger partial hashes only consume more RAM, whereas their effect
> on performance through them also consuming more cache space is minimal.
> Also, cmp_exact() is normally not that expensive.  But in your case it
> is and thus a slightly larger partial hash size may be justified.
> 
> As I wrote in another posting, I am actually considering merging some or
> all of the hash comparisons into crypt_all().  This would let you use
> specialized algorithms if you like - even offload the comparisons to
> the GPU, which is part of the plan for the formats interface rework that
> I am considering.
> 
>> next question: saltless password.
>>
>> cmp_all is not called for saltless password due to 1. ,
> 
> Sometimes it is - for low hash counts.
> 
>> so instead you get cmp_one and get_hash_xxx .
>> This unfortunately makes me have 4 clEnqueReadBuffer which are quite
>> expensive and do something like that.
> 
> As I wrote above, please use a global flag and please read the entire
> buffer at once - in cmp_exact().
> 
> For get_hash*() and cmp_one(), just use partial hashes that you already
> have.  It is OK for cmp_one() to have false positives - this is why we
> also have cmp_exact().
> 
>> to solve this problem i'd have to revectorize the opencl output,
> 
> There should be no need.  Just use large enough partial hashes so that
> cmp_exact() is infrequent enough and you can afford to fetch the entire
> buffer there.
> 
> Indeed, a lot of this is still non-optimal - starting with generating
> candidate passwords on the CPU - but my proposals above is what still
> fits in the current formats interface best.  Like I said, a longer-term
> plan is to rework the interface, but the above is what you probably want
> to do for now.

thanks!
this made me
i was able to speed up saltless password and fix some problems on nsldaps.
i'm waiting for some more test and if all goes well i'll send a new
patch for the git.

Cheers
Samuele
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.