Date: Wed, 26 Oct 2011 01:49:35 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: new DES key setup magnum, Erik - Thank you for testing! There's 184.108.40.206 in CVS now, with some very minor changes over .4 - such as for portability to ancient systems (it compiles with gcc 220.127.116.11 on Slackware 3.x again) and addition of -Os into OPT_INLINE (even though one has to remove it and -lcrypt when compiling on those ancient systems). -Os appears to deal with the performance regression we saw with gcc 4.6 - not only on x86-64/SSE2 builds, but also on several others. I was not able to trace this to a specific -f* option nor to a parameter. For example, these commands: gcc -Os -Q --help=optimizers gcc -O2 -finline-functions -Q --help=optimizers produce identical output for me, which is consistent with gcc source code (indeed), but not consistent with documentation, nor with actual optimizations I am seeing in generated code (so there must be something else that is relevant but is not shown with "-Q --help=optimizers"). Anyway, here are the new numbers for Core i7-2600K 3.4 GHz (turbo up to 3.8 GHz when only one core is in use, 3.5 GHz when all four are in use and the CPU is not overheating), Ubuntu 11.10, "gcc version 4.6.1 (Ubuntu/Linaro 4.6.1-9ubuntu3)": OpenMP, 8 threads: Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE Many salts: 22773K c/s real, 2857K c/s virtual Only one salt: 18284K c/s real, 2282K c/s virtual Benchmarking: LM DES [128/128 BS AVX-16]... DONE Raw: 88670K c/s real, 11125K c/s virtual 4 threads: Benchmarking: LM DES [128/128 BS AVX-16]... DONE Raw: 110428K c/s real, 27815K c/s virtual (limiting the number of threads to 4 only helps with LM). Non-OpenMP: Benchmarking: Traditional DES [128/128 BS AVX-16]... DONE Many salts: 5805K c/s real, 5864K c/s virtual Only one salt: 5507K c/s real, 5507K c/s virtual Benchmarking: LM DES [128/128 BS AVX-16]... DONE Raw: 70803K c/s real, 71519K c/s virtual On Tue, Oct 25, 2011 at 01:32:58AM +0200, magnum wrote: > 2011-10-25 01:23, magnum wrote: > > I have no turbo modes confusing stuff: It scales to 89% in "many salts" > > and 92% in "one salt". > > To be correct I think I should have said 88% in "many" and 86% in "one" > salt. That is: > > Non-OMP build: > Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE > Many salts: 2781K c/s real, 2787K c/s virtual > Only one salt: 2677K c/s real, 2682K c/s virtual > > OMP build ran on 2 cores: > Benchmarking: Traditional DES [128/128 BS SSE2-16]... DONE > Many salts: 4903K c/s real, 2471K c/s virtual > Only one salt: 4597K c/s real, 2314K c/s virtual > > > 4903/(2787*2) > .87961966271977036239 > > 4597/(2682*2) > .85700969425801640566 It depends on what you take for 100%. Maybe your system was under slight other load - e.g., from GUI desktop apps such as clock, load monitor, etc. You could run two instances of the non-OMP build in parallel (with a script) and add their speeds up instead of merely multiplying one instance's speed by two. Not that this matters much. Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.