Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 6 May 2015 22:35:36 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Adding OpenMP support to SunMD5

Lei, Frank -

On Mon, May 04, 2015 at 11:42:04AM +0200, Frank Dittrich wrote:
> On 05/04/2015 11:26 AM, Lei Zhang wrote:
> > I tried parallelizing a inner loop in the hotspot loop of SunMD5, like this:
> > 
> > for (round = 0; round < maxrounds; round++) {
> > ...
> > 	#pragma omp parallel for ...
> > 	for (idx = 0; idx < count; ++idx)
> > 	{ ... }
> > ...
> > }
> 
> May be I misunderstand what you are trying to do, but when implementing
> OpemMP, you shouldn't try to parallelize the calculation of a single
> hash for a given key.

Frank is correct.

However, SunMD5 is a special case, so general advice like that isn't
straightforward to apply to it.  In the SunMD5 format, we're already
computing multiple hashes per crypt_all() call, but we can't just
parallelize this with OpenMP without bringing even more parallelism to
that level.

The original SunMD5, as used defensively, used data-dependent branching.
Normally, this would thwart SIMD.  Normally, the "for each candidate"
loop would be the outer one (it is a cracking-specific one), and the
"for each round" loop the inner one (it is part of specification of a
single instance of SunMD5, and thus is also present in defensive
implementations).

Jim managed to get around SunMD5's SIMD-unfriendliness, getting multiple
instances of it to work well with SIMD.  Jim's trick involves swapping
the "for each candidate" and "for each round" loops.  Now we have:

	for each round
		for each candidate
			do some processing
		end for
	end for

These nested loops compute multiple hashes for the multiple candidates
in parallel, making use of this parallelism to have enough work buffered
to have nearly full SIMD vectors most of the time.

To add OpenMP to this, we may introduce an extra outer loop:

	for each sub-group of candidates
		for each round
			for each candidate in the sub-group
				do some processing
			end for
		end for
	end for

and in order to avoid reducing the parallelism in the previous inner
loop, we'd need to multiply max_keys_per_crypt by at least the number of
threads (times OMP_SCALE, which we'll tune; probably set it to 1 since
this is a very slow hash).

> May be you have a look at commit
> 69deec0490ee3b793545617b72f53d23680dfe4b where magnum implemented OPenMP
> for ssha512.

Yes, something like this is needed, but applied to the new outer loop.
Since we're already processing multiple keys per crypt_all(), no changes
to things such as get_key(), etc. should be needed.  Probably only
init() and crypt_all() need changes.

Thanks,

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.