Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 11 May 2015 23:06:33 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Adding OpenMP support to SunMD5

On 2015-05-11 19:58, Frank Dittrich wrote:
> On 05/11/2015 06:42 PM, Solar Designer wrote:
>> On Mon, May 11, 2015 at 05:36:31PM +0200, Frank Dittrich wrote:
>>> Will run 512 OpenMP threads
>>> Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (512xOMP) DONE
>>> Speed for cost 1 (iteration count) of 5000
>>> Raw:	9309 c/s real, 300 c/s virtual
>>
>> Great speed.
>>
>>> This (higher c/s rate for OMP_NUM_THREADS >> number of cores) matches my
>>> experience for sunmd5 on my hardware.
>>
>> This suggests that "the problem" is false sharing or something like it.
>> When you increase OMP_NUM_THREADS above the number of logical CPUs, you
>> have the threads that are actually run on the CPUs concurrently (with
>> the rest waiting to be scheduled by the kernel's scheduler) work on
>> memory regions that are farther away from each other.
>
> $ git diff
> diff --git a/src/sunmd5_fmt_plug.c b/src/sunmd5_fmt_plug.c
> index 059ef4c..8829858 100644
> --- a/src/sunmd5_fmt_plug.c
> +++ b/src/sunmd5_fmt_plug.c
> @@ -32,7 +32,7 @@ john_register_one(&fmt_sunmd5);
>
>   #ifdef _OPENMP
>   #include <omp.h>
> -#define OMP_SCALE 1
> +#define OMP_SCALE 8
>   #endif
>
>   #include "arch.h"
>
>
> This change alone results in
>
> $ ../run/john --test=10 --format=sunmd5
> Will run 32 OpenMP threads
> Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
> Speed for cost 1 (iteration count) of 5000
> Raw:	9990 c/s real, 312 c/s virtual

On my core i7 laptop, OMP_SCALE 4 is best, HT or not. Bumping to 8 
slightly degrades HT but does not change non-HT at all. This is with 4:

$ OMP_NUM_THREADS=4 ../run/john -test -form:sunmd5 && ../run/john -test 
-form:sunmd5
Will run 4 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (4xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:	2497 c/s real, 629 c/s virtual

Will run 8 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (8xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:	2671 c/s real, 345 c/s virtual

magnum


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.