Date: Fri, 17 Oct 2008 08:07:17 +0200 From: Simon Marechal <simon@...quise.net> To: john-users@...ts.openwall.com Subject: Re: fast freebsd MD5 implementation Solar Designer wrote: > I used gcc 4.3.1 on linux-x86-64 to compile/test this. In > sse-intrinsics.c, I had to change <pmmintrin.h> to <emmintrin.h> (the > former requires SSE3, which is not needed here). I also had to add > "#define MMX_COEF 4" right to that file because that definition does not > come from arch.h on x86-64. Then it compiled (with lots of warnings), > and even the test succeeds. Wow I didn't expect it to even compile on x86-64! >> Bench with standard code on my laptop: 3258k/s >> With this code: 10433k/s > > What CPU is that? > > I am getting: > > Athlon 64 3000+ 2.0 GHz > old code: 8970 c/s > new code: 9200 c/s > > Q6600 2.4 GHz (one core): > old code: 10200 c/s > new code: 24000 c/s This is on a 1.2ghz core2 CPU. It should be noted that GCC is a LOT less good with the intrinsics than ICC (and at everything else too). > Then I went to test this "for real", on a dummy password file with 120 > hashes and a wordlist with the corresponding passwords. JtR with this > patch applied cracked exactly 40 out of the 120 passwords, so clearly it > does not work right. Perhaps only one of the three sets of SSE vectors > is being dealt with correctly. Perhaps we need to make the self-test > more thorough, and have more test vectors in MD5_fmt.c (12 or more?) Argh! Well this is not surprising, I had this patch done during 1/2 day where we lost internet connectivity :) More torough self-test would be awesome, but I'd rather have support for ''background'' crypt_all() calls (obviously easy to do for crack() mode, but -test would need some tweaking). >> PS: next stop, GPU implementation. I'm getting tired of not cracking >> enough of these hashes. > > This must be fun, but perhaps it'd be better to submit a cleaner patch > (and one that actually works) for inclusion into the jumbo patch first. ;-) > > Besides the issues mentioned above, I'd like the number of sets of SSE > vectors you deal with "in parallel" made into a separate #define. Right > now, you have this number, which is 3, hard-coded into too many places. > It is possible that different numbers will be optimal for some CPUs, > some compilers, or for 32- vs. 64-bit mode vs. AltiVec (yes, by using > C intrinsics it should be possible to have the same source file work for > both SSE2 and AltiVec). Well obviously this is a PoC, that I mainly made to reproduce the results from BarsWF. On the other hand, I will use the GPU implementation (once it is written) professionally, so I'd rather have it working. I'll take a look at the bugs today, but not-horrible code will probably not happen too soon ... -- To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux - Powered by OpenVZ