Date: Fri, 3 Feb 2012 09:54:35 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: DES with OpenMP On Thu, Feb 02, 2012 at 03:21:32PM +0000, Alex Sicamiotis wrote: > Again, moving down from 32 to lower values brought significant gains - especially in 2 threads. LM seems to be "settled" at a value of 8. While for plain DES the ideal value is 1, still with a value of 8 there's not much performance impact for it while the LM benefits enormously. 8 seems to be the perfect balance (for my hardware and across both GCC and ICC) and you might consider it for the next john release after testing with other hardware as well. Thanks for your testing. I will likely need to split this into several settings for different machines and hash types. > In the meanwhile my curiosity has peaked as to why the openMP version is producing ~250 to 300k c/s over the standard non-omp client (4750k c/s vs 4450-4500k c/s). Several things being equal (no-asm for both, icc for both, non-hardware optimizations for both, a value of 1 for des_bs_cpt for both, definite use of just 1 thread for both) there are still 300k in favor of openMP which, normally, it should be slower than the non-omp version. > > Can you think of *any* other parameters which are tweakable and (may) lead to the +300k gain for the omp version? I want to try various stuff but I don't know what to tweak. My rationale is that if the non-omp version is running with at least the same parameters of the omp version, then the non-omp could be slightly faster than the omp-version (I'm always talking about 1 thread) perhaps exceeding 4.8-4.9m c/s instead of being near the 4.5m mark. No parameters to tweak, I think - it's just different code. You may try building with -D_OPENMP instead of -fopenmp - that is, don't actually enable OpenMP, but request that version of John's source code. This should complain on truly OpenMP-specific constructs such as calls to omp_get_max_threads(), which you'll need to remove (just put 1 for the threads count, etc.) It should also give warnings about the #pragma's, which you may ignore. You may analyze the generated assembly code and try to figure out why one version of it is faster than the other on your CPU. With OpenMP, the code is thread-safe, so it references the DES_bs_all structure via a pointer. On one hand, this consumes a register (leaving fewer registers for other stuff), but on the other it may result in smaller code size (only need to encode offsets relative to a pointer rather than larger absolute addresses) and thus more other stuff staying in L1 instruction cache. An easy thing to check is "size DES_bs_b.o". This is getting off-topic for john-users, though (not just tweaking, but source code changes) - want to join us on the john-dev list maybe? Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.