Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 21 Apr 2013 03:27:53 +0400
From: Solar Designer <solar@...nwall.com>
To: Tavis Ormandy <taviso@...xchg8b.com>
Cc: john-dev@...ts.openwall.com
Subject: Re: minor raw-sha1-ng pull request

Tavis, magnum -

On Fri, Apr 19, 2013 at 06:25:50PM -0700, Tavis Ormandy wrote:
> Thanks for the explanation Magnum, I get similar results! I can restructure
> cmp_all so it's also omp safe, I sent you a pull request for that. It get's
> anoter 2000K c/s on my machine.

Thanks!

The attached patch replaces the heavy "#pragma omp atomic" with much
lighter OpenMP reduction for the bitwise OR.  I've checked the OpenMP 2.5
spec (from 2005) - bitwise OR was already supported in the reduction
clause, so I think we're good in terms of portability.

Also, I get better speeds at high thread counts when OMP_SCALE is much
larger - not the current 32, but 1024 or even 10240.  With 32, there's a
performance regression when going from 4 to 8 threads on FX-8120.  With
1024, there's slight speedup.  With 10240, it's roughly 50M vs. 60M c/s
for 4 vs. 8 threads.  All of these numbers are quite low, though, given
that 1 thread does 29M, and 2 threads do 44M.  Unfortunately, this is as
expected for a fast hash like this being parallelized at this level.
We'll deal with this separately, with parallelization at a higher level.

rotateright() and rotateleft() should probably be dropped.  Only
rotateleft() is used, and not in a performance-critical place.
Moreover, it is probably slower than what gcc would generate on its own
(it uses the rol %cl,reg form of the instruction, whereas gcc would use
one with immediate shift count).

Thanks again,

Alexander

diff --git a/src/rawSHA1_ng_fmt.c b/src/rawSHA1_ng_fmt.c
index 37ae749..aa0c127 100644
--- a/src/rawSHA1_ng_fmt.c
+++ b/src/rawSHA1_ng_fmt.c
@@ -595,7 +595,7 @@ static int sha1_fmt_cmp_all(void *binary, int count)
     M = 0;
 
 #ifdef _OPENMP
-# pragma omp parallel for
+# pragma omp parallel for reduction(|:M)
 #endif
 
     // We can test for matches 4 at a time. As the common case will be that
@@ -639,9 +639,6 @@ static int sha1_fmt_cmp_all(void *binary, int count)
         R |= _mm_testz_epi32(_mm_andnot_si128(A, _mm_cmpeq_epi32(A, A)));
         A  = _mm_cmpeq_epi32(B, _mm_load_si128(&MD[i + 60]));
         R |= _mm_testz_epi32(_mm_andnot_si128(A, _mm_cmpeq_epi32(A, A)));
-#ifdef _OPENMP
-# pragma omp atomic
-#endif
         M |= R;
     }
 

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ