john-dev - Re: Adding OpenMP support to SunMD5

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-Id: <5E2283F5-5ABE-48F7-B427-27D370267638@gmail.com>
Date: Tue, 12 May 2015 15:47:23 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Adding OpenMP support to SunMD5


> On May 12, 2015, at 12:42 AM, Solar Designer <solar@...nwall.com> wrote:
> 
>> This (higher c/s rate for OMP_NUM_THREADS >> number of cores) matches my
>> experience for sunmd5 on my hardware.
> 
> This suggests that "the problem" is false sharing or something like it.
> When you increase OMP_NUM_THREADS above the number of logical CPUs, you
> have the threads that are actually run on the CPUs concurrently (with
> the rest waiting to be scheduled by the kernel's scheduler) work on
> memory regions that are farther away from each other.

I inspected the code to find potential spots of false sharing, and made a few modifications which hopefully should avoid false sharing.

(...)
 typedef struct {
        MD5_CTX context;        /* working buffer for MD5 algorithm */
        unsigned char digest[DIGEST_LEN]; /* where the MD5 digest is stored */
-} Contx, *pConx;
+} JTR_ALIGN(MEM_ALIGN_CACHE) Contx, *pConx;
 static Contx *data;
(...)
-       input_buf     = mem_calloc_align(ngroups, sizeof(*input_buf), MEM_ALIGN_SIMD);
-       input_buf_big = mem_calloc_align(ngroups, sizeof(*input_buf_big), MEM_ALIGN_SIMD);
-       out_buf       = mem_calloc_align(ngroups, sizeof(*out_buf), MEM_ALIGN_SIMD);
+       input_buf     = mem_calloc_align(ngroups, sizeof(*input_buf), MEM_ALIGN_CACHE);
+       input_buf_big = mem_calloc_align(ngroups, sizeof(*input_buf_big), MEM_ALIGN_CACHE);
+       out_buf       = mem_calloc_align(ngroups, sizeof(*out_buf), MEM_ALIGN_CACHE);
(...)
-       data = mem_calloc(self->params.max_keys_per_crypt, sizeof(*data));
+       data = mem_calloc_align(self->params.max_keys_per_crypt, sizeof(*data), MEM_ALIGN_CACHE);


However, the performance doesn't get better:

[lei@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john --test --format=sunmd5
Will run 32 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:	7256 c/s real, 226 c/s virtual

Either I'm not doing it the right way, or false sharing is not the real culprit. I'm not sure.


Lei

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.