Date: Wed, 19 May 2010 02:38:45 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: C compiler generated SSE2 code On Tue, May 18, 2010 at 01:49:46PM +0200, bartavelle@...quise.net wrote: > Le 18/05/2010 01:32, Solar Designer a ?crit : > >Can you upload it to the wiki, please? > > > >http://openwall.info/wiki/john/patches > > Done, but I made a quick git patch. Thank you! This works for now. When you update this code/patch, please specify the copyright status and license for sse-intrinsics.c (new source file added with the patch) and for your changes to MD5_fmt.c, preferably like I have suggested here: http://openwall.info/wiki/john/licensing > http://bigbox.banquise.net/jtr/gcc-4.3.2 > http://bigbox.banquise.net/jtr/clang-103935 > http://bigbox.banquise.net/jtr/icc-10.1 > > This does speak for itself :) The icc does disentangle the whole stuff, > but is still faster with 3 loops (only 2 in the sample). I think you need to disentangle the source code rather than leave that for the compiler. Specifically, I'd remove the "unneeded" MD5_PARA_DO loops. Instead, I'd define macros around primitives such as xor, which would perform the required number of instances of the operation. They would use constants for the array indices - or, if that does not work well enough, even use individual local variables instead of array elements. This is more similar to what I have in MD5_std.c, where I use separate local variables for the two instances of MD5: MD5_word a0, b0 = Cb, c0 = Cc, d0; MD5_word a1, b1, c1, d1; MD5_word u, v; I understand that you like to be able to easily adjust the number of instances that you mix, but you'll have to achieve that by defining your xor, etc. macros differently for common instance counts (say, 2 vs. 3). > Do you mind giving bench of your SSE code with ICC ? Sorry, I have no time for diving into this now. I got too many other tasks in my queue. > Or just share it so that I could try it :) I've attached a dirty patch. IIRC, this code is in a state suitable for the Sun Studio compiler. You'll likely need to change the initializer for "ones" (a trivial change) to get this to compile with gcc again. DES_BS_VECTOR34 enables 192-bit vectors with 256-bit alignment (there are two kinds of them - SSE2+MMX or SSE2+native). With other settings, you can do pure 128-bit SSE2 vectors and two kinds of 256-bit vectors (dual SSE2 or SSE2+MMX+native). In my experiments, I was using new DES S-box expressions (these are in the works), but I reverted to "plain" nonstd.c for generating this patch. You may choose to have the code use sboxes.c instead (change DES_BS from 1 to 2 in x86-64.h), which might better match the target instruction set (mostly if you use x86-64 native instructions). A known problem is that the code violates C strict aliasing rules with its use of typecasts. Yet this did not cause anything worse than compiler warnings in my testing. A fix for this may be to use unions. I think I should roll similar changes into the official JtR even if they'd be no-ops in the default build - to make it easier to conduct experiments like this. Alexander View attachment "john-1.7.5-des-intrinsics-1.diff" of type "text/plain" (29409 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.