Date: Fri, 13 Mar 2015 09:01:25 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: bitslice MD*/SHA*, AVX2 On 2015-03-11 23:07, Solar Designer wrote: > On Wed, Mar 11, 2015 at 10:45:19PM +0100, magnum wrote: >> On 2015-03-11 22:21, Solar Designer wrote: >>> In my testing, this might not be beneficial on 2-operand archs such as >>> plain x86, but it should be on 3-operand archs such as AVX. So we >>> should update the code in sse-intrinsics.c, and benchmark. And we should >>> update the plain C code anyway, such as for non-x86 archs (which are >>> mostly 3-operand RISC). >>> >>> magnum, Jim? >> >> Yeah... unless we have some GSoC candidate wanting to show his/her >> teeth? That would be a good start! > > OK, I don't mind keeping this on hold until GSoC student application > period ends. Would you track it, so it doesn't get forgotten in case no > GSoC candidate takes care of it? Out of curiosity I did some experiments with sse-intrinsics.c and I only see regression when trying to implement this. Does that make sense? I also tried with no interleaving, still a regression. Could this somehow break some other optimization made by the compiler? In the MD4 case I didn't even have to add a new temp variable, it already has tmp2 free to use at that place. It doesn't get much slower, but always definitely slower. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.