Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 19 Oct 2008 00:41:45 +0100
From: "Larry Bonner" <>
Subject: Re: fast freebsd MD5 implementation [with the attached file ...]

On Fri, Oct 17, 2008 at 12:52 PM, Simon Marechal <> wrote:
> And for more spam, this adds an ICC target

Hi Simon, great work - really opened my eyes to how good Intel
compiler is for this stuff!!

i was looking at how in I function, you use following intrinsic to
initialize 1 SSE register.

#define I(x,y,z) \
	PARA_DO(i) tmp[i] = _mm_andnot_si128((z[i]), mask); \
	PARA_DO(i) tmp[i] = _mm_or_si128((tmp[i]),(x[i])); \
	PARA_DO(i) tmp[i] = _mm_xor_si128((tmp[i]),(y[i]));

_mm_andnot_si128 = Computes AND and NOT  = PANDN

presumably the source of your mask is stored as local variable and
only accessed with 1 MOVDQA?

given that it might access the stack for each time PANDN is used,
would another intrinsic be better?
such as..

_mm_cmpeq_pi32  = Equal = PCMPEQD

this sets all bits of an SSE register to 1 if using same register for

    pcmpeqd xmm1,xmm1

just thinking that less use of stack/memory might help..not sure.

all the best.

To unsubscribe, e-mail and reply
to the automated confirmation request that will be sent to you.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.