[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 17 Oct 2008 08:07:17 +0200
From: Simon Marechal <simon@...quise.net>
To: john-users@...ts.openwall.com
Subject: Re: fast freebsd MD5 implementation
Solar Designer wrote:
> I used gcc 4.3.1 on linux-x86-64 to compile/test this. In
> sse-intrinsics.c, I had to change <pmmintrin.h> to <emmintrin.h> (the
> former requires SSE3, which is not needed here). I also had to add
> "#define MMX_COEF 4" right to that file because that definition does not
> come from arch.h on x86-64. Then it compiled (with lots of warnings),
> and even the test succeeds.
Wow I didn't expect it to even compile on x86-64!
>> Bench with standard code on my laptop: 3258k/s
>> With this code: 10433k/s
>
> What CPU is that?
>
> I am getting:
>
> Athlon 64 3000+ 2.0 GHz
> old code: 8970 c/s
> new code: 9200 c/s
>
> Q6600 2.4 GHz (one core):
> old code: 10200 c/s
> new code: 24000 c/s
This is on a 1.2ghz core2 CPU. It should be noted that GCC is a LOT less
good with the intrinsics than ICC (and at everything else too).
> Then I went to test this "for real", on a dummy password file with 120
> hashes and a wordlist with the corresponding passwords. JtR with this
> patch applied cracked exactly 40 out of the 120 passwords, so clearly it
> does not work right. Perhaps only one of the three sets of SSE vectors
> is being dealt with correctly. Perhaps we need to make the self-test
> more thorough, and have more test vectors in MD5_fmt.c (12 or more?)
Argh! Well this is not surprising, I had this patch done during 1/2 day
where we lost internet connectivity :) More torough self-test would be
awesome, but I'd rather have support for ''background'' crypt_all()
calls (obviously easy to do for crack() mode, but -test would need some
tweaking).
>> PS: next stop, GPU implementation. I'm getting tired of not cracking
>> enough of these hashes.
>
> This must be fun, but perhaps it'd be better to submit a cleaner patch
> (and one that actually works) for inclusion into the jumbo patch first. ;-)
>
> Besides the issues mentioned above, I'd like the number of sets of SSE
> vectors you deal with "in parallel" made into a separate #define. Right
> now, you have this number, which is 3, hard-coded into too many places.
> It is possible that different numbers will be optimal for some CPUs,
> some compilers, or for 32- vs. 64-bit mode vs. AltiVec (yes, by using
> C intrinsics it should be possible to have the same source file work for
> both SSE2 and AltiVec).
Well obviously this is a PoC, that I mainly made to reproduce the
results from BarsWF. On the other hand, I will use the GPU
implementation (once it is written) professionally, so I'd rather have
it working. I'll take a look at the bugs today, but not-horrible code
will probably not happen too soon ...
--
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux -
Powered by OpenVZ