Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 13 Mar 2015 09:01:25 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice MD*/SHA*, AVX2

On 2015-03-11 23:07, Solar Designer wrote:
> On Wed, Mar 11, 2015 at 10:45:19PM +0100, magnum wrote:
>> On 2015-03-11 22:21, Solar Designer wrote:
>>> In my testing, this might not be beneficial on 2-operand archs such as
>>> plain x86, but it should be on 3-operand archs such as AVX.  So we
>>> should update the code in sse-intrinsics.c, and benchmark.  And we should
>>> update the plain C code anyway, such as for non-x86 archs (which are
>>> mostly 3-operand RISC).
>>>
>>> magnum, Jim?
>>
>> Yeah... unless we have some GSoC candidate wanting to show his/her
>> teeth? That would be a good start!
> 
> OK, I don't mind keeping this on hold until GSoC student application
> period ends.  Would you track it, so it doesn't get forgotten in case no
> GSoC candidate takes care of it?

Out of curiosity I did some experiments with sse-intrinsics.c and I only
see regression when trying to implement this. Does that make sense? I
also tried with no interleaving, still a regression. Could this somehow
break some other optimization made by the compiler? In the MD4 case I
didn't even have to add a new temp variable, it already has tmp2 free to
use at that place.

It doesn't get much slower, but always definitely slower.

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ