Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 11 May 2015 18:17:17 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Adding OpenMP support to SunMD5

On Sat, May 09, 2015 at 10:39:34PM +0800, Lei Zhang wrote:
> On May 9, 2015, at 7:31 PM, Solar Designer <solar@...nwall.com> wrote:
> > Please try the hints from super's /etc/motd, in particular "export
> > GOMP_CPU_AFFINITY=0-31"  Does it help?
> 
> Yes, it helps!
> 
> [lei@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john --test --format=sunmd5
> Will run 32 OpenMP threads
> Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
> Speed for cost 1 (iteration count) of 5000
> Raw:	7372 c/s real, 231 c/s virtual
> 
> I also tested it with my previous version, where threadprivate is used.
> 
> [lei@...er src]$ GOMP_CPU_AFFINITY=0-31 ../run/john --test --format=sunmd5
> Will run 32 OpenMP threads
> Benchmarking: SunMD5 [MD5 128/128 AVX 4x3]... (32xOMP) DONE
> Speed for cost 1 (iteration count) of 5000
> Raw:	8302 c/s real, 259 c/s virtual
> 
> The newer version is still slower.

Both are still slower than our target speed, which I measured at around
8800 c/s based on cumulative performance for --fork=32 with a version
from just prior to your work.

> > Other than that, it is possible that you ran into false sharing.  Having
> > a gap between the different threads' data structures can be beneficial.
> 
> I'm not sure about that. I just multiply the size of the original arrays by the number of threads. There should be no cross-referencing among threads.

False sharing is called such because it occurs when there's no actual
sharing of data between the threads, but there's sharing of cache lines.

http://en.wikipedia.org/wiki/False_sharing

> Dynamic arrays have slower accessing than static arrays. Could that be the reason of performance degeneration?

Yes, changes in addressing modes can cause performance degradation (or
improvement).  It's not exactly dynamic vs. static arrays, though: IIRC,
the old code was already using dynamically allocated arrays except in
debugging builds.  So it's more like the specific way you're computing
the array indices.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ