Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 17 Oct 2008 08:07:17 +0200
From: Simon Marechal <>
Subject: Re: fast freebsd MD5 implementation

Solar Designer wrote:
> I used gcc 4.3.1 on linux-x86-64 to compile/test this.  In
> sse-intrinsics.c, I had to change <pmmintrin.h> to <emmintrin.h> (the
> former requires SSE3, which is not needed here).  I also had to add
> "#define MMX_COEF 4" right to that file because that definition does not
> come from arch.h on x86-64.  Then it compiled (with lots of warnings),
> and even the test succeeds.

Wow I didn't expect it to even compile on x86-64!

>> Bench with standard code on my laptop: 3258k/s
>> With this code: 10433k/s
> What CPU is that?
> I am getting:
> Athlon 64 3000+ 2.0 GHz
> old code: 8970 c/s
> new code: 9200 c/s
> Q6600 2.4 GHz (one core):
> old code: 10200 c/s
> new code: 24000 c/s

This is on a 1.2ghz core2 CPU. It should be noted that GCC is a LOT less 
good with the intrinsics than ICC (and at everything else too).

> Then I went to test this "for real", on a dummy password file with 120
> hashes and a wordlist with the corresponding passwords.  JtR with this
> patch applied cracked exactly 40 out of the 120 passwords, so clearly it
> does not work right.  Perhaps only one of the three sets of SSE vectors
> is being dealt with correctly.  Perhaps we need to make the self-test
> more thorough, and have more test vectors in MD5_fmt.c (12 or more?)

Argh! Well this is not surprising, I had this patch done during 1/2 day 
where we lost internet connectivity :) More torough self-test would be 
awesome, but I'd rather have support for ''background'' crypt_all() 
calls (obviously easy to do for crack() mode, but -test would need some 

>> PS: next stop, GPU implementation. I'm getting tired of not cracking 
>> enough of these hashes.
> This must be fun, but perhaps it'd be better to submit a cleaner patch
> (and one that actually works) for inclusion into the jumbo patch first. ;-)
> Besides the issues mentioned above, I'd like the number of sets of SSE
> vectors you deal with "in parallel" made into a separate #define.  Right
> now, you have this number, which is 3, hard-coded into too many places.
> It is possible that different numbers will be optimal for some CPUs,
> some compilers, or for 32- vs. 64-bit mode vs. AltiVec (yes, by using
> C intrinsics it should be possible to have the same source file work for
> both SSE2 and AltiVec).

Well obviously this is a PoC, that I mainly made to reproduce the 
results from BarsWF. On the other hand, I will use the GPU 
implementation (once it is written) professionally, so I'd rather have 
it working. I'll take a look at the bugs today, but not-horrible code 
will probably not happen too soon ...

To unsubscribe, e-mail and reply
to the automated confirmation request that will be sent to you.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.