Date: Sun, 21 Apr 2013 03:27:53 +0400 From: Solar Designer <solar@...nwall.com> To: Tavis Ormandy <taviso@...xchg8b.com> Cc: john-dev@...ts.openwall.com Subject: Re: minor raw-sha1-ng pull request Tavis, magnum - On Fri, Apr 19, 2013 at 06:25:50PM -0700, Tavis Ormandy wrote: > Thanks for the explanation Magnum, I get similar results! I can restructure > cmp_all so it's also omp safe, I sent you a pull request for that. It get's > anoter 2000K c/s on my machine. Thanks! The attached patch replaces the heavy "#pragma omp atomic" with much lighter OpenMP reduction for the bitwise OR. I've checked the OpenMP 2.5 spec (from 2005) - bitwise OR was already supported in the reduction clause, so I think we're good in terms of portability. Also, I get better speeds at high thread counts when OMP_SCALE is much larger - not the current 32, but 1024 or even 10240. With 32, there's a performance regression when going from 4 to 8 threads on FX-8120. With 1024, there's slight speedup. With 10240, it's roughly 50M vs. 60M c/s for 4 vs. 8 threads. All of these numbers are quite low, though, given that 1 thread does 29M, and 2 threads do 44M. Unfortunately, this is as expected for a fast hash like this being parallelized at this level. We'll deal with this separately, with parallelization at a higher level. rotateright() and rotateleft() should probably be dropped. Only rotateleft() is used, and not in a performance-critical place. Moreover, it is probably slower than what gcc would generate on its own (it uses the rol %cl,reg form of the instruction, whereas gcc would use one with immediate shift count). Thanks again, Alexander View attachment "john-rawSHA1_ng_fmt-omp-reduction.diff" of type "text/plain" (758 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.