Date: Sun, 19 Oct 2008 00:41:45 +0100 From: "Larry Bonner" <larry.bonner1@...il.com> To: john-users@...ts.openwall.com Subject: Re: fast freebsd MD5 implementation [with the attached file ...] On Fri, Oct 17, 2008 at 12:52 PM, Simon Marechal <simon@...quise.net> wrote: > And for more spam, this adds an ICC target > > http://btb.banquise.net/bin/john-18.104.22.168-all-5-fastMD5.3.diff.gz > Hi Simon, great work - really opened my eyes to how good Intel compiler is for this stuff!! i was looking at how in I function, you use following intrinsic to initialize 1 SSE register. #define I(x,y,z) \ PARA_DO(i) tmp[i] = _mm_andnot_si128((z[i]), mask); \ PARA_DO(i) tmp[i] = _mm_or_si128((tmp[i]),(x[i])); \ PARA_DO(i) tmp[i] = _mm_xor_si128((tmp[i]),(y[i])); _mm_andnot_si128 = Computes AND and NOT = PANDN presumably the source of your mask is stored as local variable and only accessed with 1 MOVDQA? given that it might access the stack for each time PANDN is used, would another intrinsic be better? such as.. _mm_cmpeq_pi32 = Equal = PCMPEQD this sets all bits of an SSE register to 1 if using same register for source/destination pcmpeqd xmm1,xmm1 just thinking that less use of stack/memory might help..not sure. all the best. -- To unsubscribe, e-mail john-users-unsubscribe@...ts.openwall.com and reply to the automated confirmation request that will be sent to you.
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux - Powered by OpenVZ