Date: Thu, 15 Mar 2012 11:46:09 +0200 From: Milen Rangelov <gat3way@...il.com> To: john-dev@...ts.openwall.com Subject: Re: AMD Bulldozer and XOP (was: RAR format finally proper) Hello Alexander, That's as expected. In fact, you're lucky that the FX is not slower in > your case. > > The 6-core (and 8-core for FX-81x0) is actually 3-module (or > 4-module, respectively), where each module has two sets of register > files (like with Intel's Hyperthreading) and two sets of integer ALUs > and AGUs (that's a new thing compared to Intel's Hyperthreading), but > only a shared set of other execution units (including vector). So when > you primarily use the vector ops, the CPU is effectively 3-core (or > 4-core for FX-81x0) with SMT. > > > http://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)#Bulldozer_core_.28module.29 > > Should have read that before I spent EUR 250 for a new CPU and motherboard, I feel pissed now eheh. In fact it was a bit faster for MD5 and MD4, a bit slower for SHA1 and almost the same speed for DES-based hashes. > Yeah, I was planning to try that in JtR as well, but didn't get around > to it yet. It's good news that this worked well for you. > Actually the quoted improvement percentage was not correct. I did some more improvements (like e.g using SSE3 shuffle to speed up the byte order reversals in SHA1 and optimizing a bit the early checks). What I got after those: MD5 single hash: 128M c/s with SSE2 -> 181M c/s with XOP NTLM single hash: 114M c/s with SSE2 -> 162M c/s with XOP SHA1 single hash: 39M c/s with SSE2 -> 63M c/s with XOP (on PhenomII X4 it used to be around 42M c/s) All those numbers are singlehash. This skips slow bitmap lookups and I also do an early check several steps in advance (but no MD5/MD4 step reversals as it is incompatible with my design). I noticed multihash MD5 improved a lot with the new CPU - I get about 120M c/s as compared to ~65M c/s on PhenomII X4. > They're Roman's, not mine; my role was to choose the versions producing > more optimal code (considering register pressure and parallelism). > > That's puzzling indeed. XOP does provide some decent speedup over AVX > for bitslice DES in JtR when benchmarked on the same Bulldozer CPU. You > can see the numbers here: http://openwall.info/wiki/john/benchmarks > > 18527K / 14247K 128/128 BS XOP-16 > vs. > 16442K / 12792K 128/128 BS AVX-16 > > 4700K / 4418K 128/128 BS XOP-16 > vs. > 3951K / 3786K 128/128 BS AVX-16 > > These are for "FX-8120 o/c 3.6 GHz + turbo". These were > user-contributed numbers. Somehow I am getting numbers similar to the > above on my FX-8120 without any overclocking (well, maybe only slightly > lower numbers). I'll do more benchmarking with different clock rates > and update the wiki later. BTW, Core i7-2600 at stock clock rate > (3.4 GHz + turbo) is faster at these despite of only having AVX: > > 22773K / 18284K 128/128 BS AVX-16 (8 threads) > 5802K / 5491K 128/128 BS AVX-16 (non-OpenMP) > Well, I tried using 128-bit xmm instructions, not the ymm version (which would require a lot of rework). Speed with both the Kwan's sboxes/SSE2 and Roman's with XOP is about 5M c/s (6 threads). I believe the problem is somewhere else (my key/block setup to blame perhaps). > FX-8120 has slight advantage over Core i7-2600 at ALU-heavy code such as > bcrypt, though: approx. 5500 c/s vs. 4800 c/s for 8 threads. These > correspond to approx. 5.8x vs. 5.1x increase over single thread speed. > I don't support bcrypt yet :( > What specific speeds did you get, and for what hash types? > > For LM hashes, the key setup may easily eat up more time than the DES > encryption step does. > For LM hashes I get some improvement over Kwan's sboxes (42M c/s vs 35M c/s). I think my bitslice key/block set up is to blame for this. I use some loops with _mm_movemask_epi8 which at first glance look neat and fast, however this same PMOVMSKB thing is not that fast indeed. I also suspect I got a lot of register spills because of my not quite optimal code. Regards. Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.