Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 24 Jan 2012 01:45:20 +0400
From: Solar Designer <>
Subject: Re: best approach to cmp_all and cmp_one

Hi Samuele,

On Mon, Jan 23, 2012 at 10:03:55PM +0100, Samuele Giovanni Tonon wrote:
> sorry to bring back another question about cmp_all and cmp_one.

Why, the question is entirely appropriate and your progress at improving
your code is desirable.  There's nothing for you to be sorry for.

> however: if cmp_all get another call before a crypt_all i will just read
> garbage; would be good on having a global semaphore to avoid a
> clEnqueReadBuffer if there's hasn't been any previous crypt_all or there
> are more elegant ways to solve the problem ?

Having a global flag (not a semaphore) is in fact what I'd recommend.
You may call it, say, have_full_hashes.  Set it to 0 in crypt_all()
(because you only read 1/5 - 32 bits out of 160 bits per hash, right?)

In cmp_all(), you just use your partial hashes - cmp_all()'s return
value does not need to be precise (non-zero means that there _might_ be
matches).  You do not need to access this flag there, nor to retrieve
the full hashes.

Ditto for cmp_one().

However, in cmp_exact() you do need to check the flag and if it's still
0, then retrieve the full hashes from the GPU (all of them at once) and
set the flag.  This means that subsequent calls to cmp_exact(), if any,
will not need to retrieve anything.

> second: i was thinking on getting the i-th hashes instead of the whole
> buffer but i discovered cmp_one goes incremental, let me explain by an
> example
> suppose NUM_KEYS is 1000 and cmp_all(1000) is called and
> b == outbuffer[i] is TRUE for 995, then cmp_one will be called like that:
> cmp_one(0)
> cmp_one(1)
> .....
> cmp_one(995) and password is found.
> wouldn't be best to have cmp_one start at 995 and incrementing until
> NUM_KEYS is reached instead of starting from the beginning or to do that
> there will be side effects or a major rewrite of the code ?

There should be no need to do that.  cmp_all() should give false
positives infrequently enough that we should be able to afford doing
things non-optimally in that case.  In case cmp_all()'s false positives
are too frequent, you should instead increase the size of your partial
hashes.  For example, if your max_keys_per_crypt is 0x10000, you load
0x10000 of same-salt hashes for cracking, and your partial hashes are
32-bit, then false positives become very frequent (not cmp_all()'s, but
get_hash*() and cmp_one()'s, though - since cmp_all() is not used on
this many loaded hashes).  Since your cmp_exact() is very expensive,
you should deal with this by increasing the partial hash size - e.g., to
48 bits.

Sure, increasing the partial hash size may have a performance penalty of
its own in your case.  This is normally not the case for CPU-only code,
where larger partial hashes only consume more RAM, whereas their effect
on performance through them also consuming more cache space is minimal.
Also, cmp_exact() is normally not that expensive.  But in your case it
is and thus a slightly larger partial hash size may be justified.

As I wrote in another posting, I am actually considering merging some or
all of the hash comparisons into crypt_all().  This would let you use
specialized algorithms if you like - even offload the comparisons to
the GPU, which is part of the plan for the formats interface rework that
I am considering.

> next question: saltless password.
> cmp_all is not called for saltless password due to 1. ,

Sometimes it is - for low hash counts.

> so instead you get cmp_one and get_hash_xxx .
> This unfortunately makes me have 4 clEnqueReadBuffer which are quite
> expensive and do something like that.

As I wrote above, please use a global flag and please read the entire
buffer at once - in cmp_exact().

For get_hash*() and cmp_one(), just use partial hashes that you already
have.  It is OK for cmp_one() to have false positives - this is why we
also have cmp_exact().

> to solve this problem i'd have to revectorize the opencl output,

There should be no need.  Just use large enough partial hashes so that
cmp_exact() is infrequent enough and you can afford to fetch the entire
buffer there.

Indeed, a lot of this is still non-optimal - starting with generating
candidate passwords on the CPU - but my proposals above is what still
fits in the current formats interface best.  Like I said, a longer-term
plan is to rework the interface, but the above is what you probably want
to do for now.



Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ