Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 12 May 2015 02:11:03 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Adding OpenMP support to SunMD5

On 2015-05-11 23:06, magnum wrote:
> On my core i7 laptop, OMP_SCALE 4 is best, HT or not. Bumping to 8
> slightly degrades HT but does not change non-HT at all. This is with 4:
>
> $ OMP_NUM_THREADS=4 ../run/john -test -form:sunmd5 && ../run/john -test
> -form:sunmd5
> Will run 4 OpenMP threads
> Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (4xOMP) DONE
> Speed for cost 1 (iteration count) of 5000
> Raw:    2497 c/s real, 629 c/s virtual
>
> Will run 8 OpenMP threads
> Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (8xOMP) DONE
> Speed for cost 1 (iteration count) of 5000
> Raw:    2671 c/s real, 345 c/s virtual

After replacing the bad #ifdefs for MAX_KEYS_PER_CRYPT (mentioned in an 
other thread) with just SIMD_COEF_32 * MD5_SSE_PARA, I saw a slowdown. 
So I added a fixed multiplier and bumped it running a single thread 
until I seemed to hit a sweet spot. It ended up as

#define MIN_KEYS_PER_CRYPT  SIMD_COEF_32
#define MAX_KEYS_PER_CRYPT  (32 * SIMD_COEF_32 * MD5_SSE_PARA)

That ends up, in this case, as 384 while the old ifdefs would pick 96. I 
got a 6% speedup for single-thread compared to old non-OMP code.

Then I ran with 8 threads HT and verified OMP_SCALE. It's now best kept 
at 1 or 2. New speed:

$ ../run/john -test -form:sunmd5
Will run 8 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (8xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:	2682 c/s real, 351 c/s virtual

Then I took it to Super and tried it as-is, hoping for the best figure 
yet. Unfortunately it did not fly:

$ GOMP_CPU_AFFINITY=0-31 ../run/john -test -form:sunmd5
Will run 32 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:	8330 c/s real, 260 c/s virtual

Frank had 9909 c/s at some point so this is not quite it. But the speed 
is now much more stable between runs (btw this was also with 
MEM_ALIGN_CACHE). In fact I even got the exact same speed using 
OMP_NUM_THREADS=64. That's odd though.

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ