john-dev - Re: xsha512-cuda & xsha512-opencl testing

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120716105209.GA21798@openwall.com>
Date: Mon, 16 Jul 2012 14:52:09 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: xsha512-cuda & xsha512-opencl testing

On Mon, Jul 16, 2012 at 06:23:36PM +0800, myrice wrote:
> BTW, what is exactly bitmaps you said for my format? Can you give me
> more details? I think bitmaps are compressed hash table. How does JtR
> lookups the hash table?

See cracker.c: crk_password_loop() in CVS or bleeding.  Specifically:

	if (!salt->bitmap) {
[...]
	} else
	for (index = 0; index < crk_key_index; index++) {
		int hash = salt->index(index);
		if (salt->bitmap[hash / (sizeof(*salt->bitmap) * 8)] &
		    (1U << (hash % (sizeof(*salt->bitmap) * 8)))) {
			pw = salt->hash[hash >> PASSWORD_HASH_SHR];
			do {
				if (crk_methods.cmp_one(pw->binary, index))
				if (crk_methods.cmp_exact(crk_methods.source(
				    pw->source, pw->binary), index))
				if (crk_process_guess(salt, pw, index))
					return 1;
			} while ((pw = pw->next_hash));
		}
	}

As you can see, a bitmap is used like a hash table would be, but it only
gives us a maybe/no answer.  It should be large enough that "no" is the
typical answer.

If it says "maybe", then we do a hash table lookup - but the hash table
has fewer entries (due to PASSWORD_HASH_SHR).

There's at least one entry in each hash bucket for which the bitmap has
a 1 in any of the bits corresponding to that hash bucket.  Thus, we skip a
check for non-NULL on first loop iteration over entries in the hash bucket.

See also code in crk_remove_hash() for how the bitmap and hash table are
updated on a successful guess, and loader.c for how they're populated.

An optimization is to interleave bitmap lookups for several computed
hashes (such as for adjacent indices).  I've tried it out, and it along
with OpenMP parallelization provided IIRC over a 3x speedup on a 2xE5420
machine when cracking 10 million of fast and saltless hashes (the
specific info should be in my john-dev postings from January this year).
I did not finalize/commit these changes yet.

I guess that on GPU interleaving of memory accesses would be even more
important, although maybe scheduling enough work-items would take care
of that without us having to complicate the source code.

It could also be desirable to keep small bitmaps in local memory, only
starting to use global memory when speedup from the bitmap size increase
permitted by that outweighs the slowdown from slower individual lookups.

I think you should implement these in a simpler fashion first, and only
then try to optimize - the usual approach for most non-trivial tasks.

Thanks,

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.