Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 12 May 2015 15:47:23 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Adding OpenMP support to SunMD5


> On May 12, 2015, at 12:42 AM, Solar Designer <solar@...nwall.com> wrote:
> 
>> This (higher c/s rate for OMP_NUM_THREADS >> number of cores) matches my
>> experience for sunmd5 on my hardware.
> 
> This suggests that "the problem" is false sharing or something like it.
> When you increase OMP_NUM_THREADS above the number of logical CPUs, you
> have the threads that are actually run on the CPUs concurrently (with
> the rest waiting to be scheduled by the kernel's scheduler) work on
> memory regions that are farther away from each other.

I inspected the code to find potential spots of false sharing, and made a few modifications which hopefully should avoid false sharing.

(...)
 typedef struct {
        MD5_CTX context;        /* working buffer for MD5 algorithm */
        unsigned char digest[DIGEST_LEN]; /* where the MD5 digest is stored */
-} Contx, *pConx;
+} JTR_ALIGN(MEM_ALIGN_CACHE) Contx, *pConx;
 static Contx *data;
(...)
-       input_buf     = mem_calloc_align(ngroups, sizeof(*input_buf), MEM_ALIGN_SIMD);
-       input_buf_big = mem_calloc_align(ngroups, sizeof(*input_buf_big), MEM_ALIGN_SIMD);
-       out_buf       = mem_calloc_align(ngroups, sizeof(*out_buf), MEM_ALIGN_SIMD);
+       input_buf     = mem_calloc_align(ngroups, sizeof(*input_buf), MEM_ALIGN_CACHE);
+       input_buf_big = mem_calloc_align(ngroups, sizeof(*input_buf_big), MEM_ALIGN_CACHE);
+       out_buf       = mem_calloc_align(ngroups, sizeof(*out_buf), MEM_ALIGN_CACHE);
(...)
-       data = mem_calloc(self->params.max_keys_per_crypt, sizeof(*data));
+       data = mem_calloc_align(self->params.max_keys_per_crypt, sizeof(*data), MEM_ALIGN_CACHE);


However, the performance doesn't get better:

[lei@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john --test --format=sunmd5
Will run 32 OpenMP threads
Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
Speed for cost 1 (iteration count) of 5000
Raw:	7256 c/s real, 226 c/s virtual

Either I'm not doing it the right way, or false sharing is not the real culprit. I'm not sure.


Lei

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ