john-users - Re: fast freebsd MD5 implementation

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <48F82B95.2010404@banquise.net>
Date: Fri, 17 Oct 2008 08:07:17 +0200
From: Simon Marechal <simon@...quise.net>
To: john-users@...ts.openwall.com
Subject: Re: fast freebsd MD5 implementation

Solar Designer wrote:
> I used gcc 4.3.1 on linux-x86-64 to compile/test this.  In
> sse-intrinsics.c, I had to change <pmmintrin.h> to <emmintrin.h> (the
> former requires SSE3, which is not needed here).  I also had to add
> "#define MMX_COEF 4" right to that file because that definition does not
> come from arch.h on x86-64.  Then it compiled (with lots of warnings),
> and even the test succeeds.

Wow I didn't expect it to even compile on x86-64!


>> Bench with standard code on my laptop: 3258k/s
>> With this code: 10433k/s
> 
> What CPU is that?
> 
> I am getting:
> 
> Athlon 64 3000+ 2.0 GHz
> old code: 8970 c/s
> new code: 9200 c/s
> 
> Q6600 2.4 GHz (one core):
> old code: 10200 c/s
> new code: 24000 c/s

This is on a 1.2ghz core2 CPU. It should be noted that GCC is a LOT less 
good with the intrinsics than ICC (and at everything else too).

> Then I went to test this "for real", on a dummy password file with 120
> hashes and a wordlist with the corresponding passwords.  JtR with this
> patch applied cracked exactly 40 out of the 120 passwords, so clearly it
> does not work right.  Perhaps only one of the three sets of SSE vectors
> is being dealt with correctly.  Perhaps we need to make the self-test
> more thorough, and have more test vectors in MD5_fmt.c (12 or more?)

Argh! Well this is not surprising, I had this patch done during 1/2 day 
where we lost internet connectivity :) More torough self-test would be 
awesome, but I'd rather have support for ''background'' crypt_all() 
calls (obviously easy to do for crack() mode, but -test would need some 
tweaking).

>> PS: next stop, GPU implementation. I'm getting tired of not cracking 
>> enough of these hashes.
> 
> This must be fun, but perhaps it'd be better to submit a cleaner patch
> (and one that actually works) for inclusion into the jumbo patch first. ;-)
> 
> Besides the issues mentioned above, I'd like the number of sets of SSE
> vectors you deal with "in parallel" made into a separate #define.  Right
> now, you have this number, which is 3, hard-coded into too many places.
> It is possible that different numbers will be optimal for some CPUs,
> some compilers, or for 32- vs. 64-bit mode vs. AltiVec (yes, by using
> C intrinsics it should be possible to have the same source file work for
> both SSE2 and AltiVec).

Well obviously this is a PoC, that I mainly made to reproduce the 
results from BarsWF. On the other hand, I will use the GPU 
implementation (once it is written) professionally, so I'd rather have 
it working. I'll take a look at the bugs today, but not-horrible code 
will probably not happen too soon ...

-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.