Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 19 Oct 2008 00:41:45 +0100
From: "Larry Bonner" <larry.bonner1@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: fast freebsd MD5 implementation [with the attached file ...]

On Fri, Oct 17, 2008 at 12:52 PM, Simon Marechal <simon@...quise.net> wrote:
> And for more spam, this adds an ICC target
>
> http://btb.banquise.net/bin/john-1.7.3.1-all-5-fastMD5.3.diff.gz
>

Hi Simon, great work - really opened my eyes to how good Intel
compiler is for this stuff!!

i was looking at how in I function, you use following intrinsic to
initialize 1 SSE register.

#define I(x,y,z) \
	PARA_DO(i) tmp[i] = _mm_andnot_si128((z[i]), mask); \
	PARA_DO(i) tmp[i] = _mm_or_si128((tmp[i]),(x[i])); \
	PARA_DO(i) tmp[i] = _mm_xor_si128((tmp[i]),(y[i]));

_mm_andnot_si128 = Computes AND and NOT  = PANDN

presumably the source of your mask is stored as local variable and
only accessed with 1 MOVDQA?

given that it might access the stack for each time PANDN is used,
would another intrinsic be better?
such as..

_mm_cmpeq_pi32  = Equal = PCMPEQD

this sets all bits of an SSE register to 1 if using same register for
source/destination

    pcmpeqd xmm1,xmm1

just thinking that less use of stack/memory might help..not sure.

all the best.

-- 
To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply
to the automated confirmation request that will be sent to you.

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ