Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 21 Apr 2013 03:27:53 +0400
From: Solar Designer <solar@...nwall.com>
To: Tavis Ormandy <taviso@...xchg8b.com>
Cc: john-dev@...ts.openwall.com
Subject: Re: minor raw-sha1-ng pull request

Tavis, magnum -

On Fri, Apr 19, 2013 at 06:25:50PM -0700, Tavis Ormandy wrote:
> Thanks for the explanation Magnum, I get similar results! I can restructure
> cmp_all so it's also omp safe, I sent you a pull request for that. It get's
> anoter 2000K c/s on my machine.

Thanks!

The attached patch replaces the heavy "#pragma omp atomic" with much
lighter OpenMP reduction for the bitwise OR.  I've checked the OpenMP 2.5
spec (from 2005) - bitwise OR was already supported in the reduction
clause, so I think we're good in terms of portability.

Also, I get better speeds at high thread counts when OMP_SCALE is much
larger - not the current 32, but 1024 or even 10240.  With 32, there's a
performance regression when going from 4 to 8 threads on FX-8120.  With
1024, there's slight speedup.  With 10240, it's roughly 50M vs. 60M c/s
for 4 vs. 8 threads.  All of these numbers are quite low, though, given
that 1 thread does 29M, and 2 threads do 44M.  Unfortunately, this is as
expected for a fast hash like this being parallelized at this level.
We'll deal with this separately, with parallelization at a higher level.

rotateright() and rotateleft() should probably be dropped.  Only
rotateleft() is used, and not in a performance-critical place.
Moreover, it is probably slower than what gcc would generate on its own
(it uses the rol %cl,reg form of the instruction, whereas gcc would use
one with immediate shift count).

Thanks again,

Alexander

View attachment "john-rawSHA1_ng_fmt-omp-reduction.diff" of type "text/plain" (758 bytes)

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ