Date: Fri, 17 Oct 2008 03:02:14 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: fast freebsd MD5 implementation On Wed, Oct 15, 2008 at 03:10:49PM +0200, Simon Marechal wrote: > After exchanging mails with the author of http://3.14.by/en/md5, I toyed > with the idea of using similar code for freebsd MD5. So here is a patch > that uses a similar implementation, but for the whole freebsd MD5 > crypt(). Nice stuff, but very dirty as you know. ;-) > It is set to use ICC, so you might want to download it (it > speeds everything up anyway) to test, but it should be possible to have > it run on gcc (don't forget -march=nocona). The patch is against > john-22.214.171.124-all, not vanilla! (it should work with vanilla anyway) The patch applies cleanly against john-126.96.36.199-all-5; a Makefile hunk won't apply to anything older, but the changes to Makefile are mostly specific to Intel's compiler anyway. What I did for my testing was to not use the Makefile patch, but to add sse-intrinsics.o (not the .c as the patch does!) to JOHN_OBJS_MINIMAL manually. I used gcc 4.3.1 on linux-x86-64 to compile/test this. In sse-intrinsics.c, I had to change <pmmintrin.h> to <emmintrin.h> (the former requires SSE3, which is not needed here). I also had to add "#define MMX_COEF 4" right to that file because that definition does not come from arch.h on x86-64. Then it compiled (with lots of warnings), and even the test succeeds. > Bench with standard code on my laptop: 3258k/s > With this code: 10433k/s What CPU is that? I am getting: Athlon 64 3000+ 2.0 GHz old code: 8970 c/s new code: 9200 c/s Q6600 2.4 GHz (one core): old code: 10200 c/s new code: 24000 c/s I used the same "john" binaries (built with gcc 4.3.1) on both machines. I think that the speedup in your case is bigger primarily because you were benchmarking "32-bit" code, whereas on x86-64 the existing implementation ("old code") computes two hashes at a time, which makes it around 50% faster. Then I went to test this "for real", on a dummy password file with 120 hashes and a wordlist with the corresponding passwords. JtR with this patch applied cracked exactly 40 out of the 120 passwords, so clearly it does not work right. Perhaps only one of the three sets of SSE vectors is being dealt with correctly. Perhaps we need to make the self-test more thorough, and have more test vectors in MD5_fmt.c (12 or more?) > PS: next stop, GPU implementation. I'm getting tired of not cracking > enough of these hashes. This must be fun, but perhaps it'd be better to submit a cleaner patch (and one that actually works) for inclusion into the jumbo patch first. ;-) Besides the issues mentioned above, I'd like the number of sets of SSE vectors you deal with "in parallel" made into a separate #define. Right now, you have this number, which is 3, hard-coded into too many places. It is possible that different numbers will be optimal for some CPUs, some compilers, or for 32- vs. 64-bit mode vs. AltiVec (yes, by using C intrinsics it should be possible to have the same source file work for both SSE2 and AltiVec). Thanks, Alexander -- To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.