Date: Sat, 31 Dec 2011 22:50:54 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: DES with OpenMP On Sat, Dec 31, 2011 at 02:09:13PM +0000, Alex Sicamiotis wrote: > I've benchmarked DES (openMP) with GCC 4.6 / 4.7 and ICC 12.1... Thank you for contributing those benchmark results to the wiki. It's an impressive overclock you got (a really cheap CPU at 4 GHz). http://openwall.info/wiki/john/benchmarks > In my case, (dual core Celeron E3200 - Wolfdale 45nm core), the second core scaled +80% for GCC, and over 99.5% for ICC. So there's some kind of problem (?) in GCC-OpenMP I suppose for DES. +80% is reasonable (that's 90% efficiency - that is, 180% out of 200%), +99.5% is too high. In my testing, the efficiency of the bitslice DES parallelization with OpenMP is at around 90% for DES-based crypt(3) for "many salts" on current multi-core CPUs. +99.5% indicates that there is another source of speedup besides the use of a second core. Please take into consideration that in non-OpenMP builds for -x86-64 (as well as for some other x86-* targets) assembly code is being used for bitslice DES. When you enable OpenMP, that is disabled in favor of C code with SSE2 intrinsics. It is possible that ICC tunes the C + SSE2 intrinsics code for your specific CPU model, whereas the supplied SSE2 assembly code was tuned for Core 2 in general. Also, the compiler is given an opportunity to make some cross-S-box optimizations, which are not made in the supplied assembly code. These extra optimizations might account for 3% or so, which explains the unbelievably high parallelization efficiency (for this code). (96% would be believable, albeit still very high for this code.) With GCC 4.6, there is a performance regression (compared to 4.5 and 4.4), which was especially bad without OpenMP. This is one reason why JtR 1.7.9 forces the use of the supplied assembly code (whenever available) for non-OpenMP builds. When you build with GCC 4.6 and OpenMP, you may be hit by this performance regression to some extent. You may want to try GCC 4.5 to avoid it. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017 > Performance with GCC+openMP was better in blowfish and MD5 for GCC compared to ICC (not necessarily because ICC had scaling problems - rather it was slower in MD5 + Blowfish, even in single session). BTW, 99% parallelization efficiency is quite realistic for these slower hash types. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.