john-dev - Re: interleaved bitslice?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <866a91bf524d9ecf225d8598953db2e7@smtp.hushmail.com>
Date: Sat, 14 Mar 2015 22:49:30 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: interleaved bitslice?

On 2015-03-14 20:01, Solar Designer wrote:
> On Thu, Mar 12, 2015 at 08:37:01AM +0100, magnum wrote:
>> On 2015-03-11 21:55, Solar Designer wrote:
>>> solar@...l:~/md5slice$ gcc md5slice.c -o md5slice -Wall -s -O3 -fomit-frame-pointer -funroll-loops -DVECTOR -march=native
>>>
>>> This gave "warning: always_inline function might not be inlinable" about
>>> FF(), I(), H(), F(), add32r(), add32c(), add32() - but then it built
>>> fine.  The speed is:
>>
>> Solar,
>>
>> While experimenting with this I noticed using a vector size of 32 but
>> still compiling for AVX gave a slight boost (~5%). I assume this ends up
>> similar to the interleaving we use in Jumbo, and is faster for the same
>> reasons.
> 
> I've just tested this with gcc 4.9.2 on Linux, and the generated code is
> "floating-point" 256-bit AVX.  So this is not interleaving.
> 

And yet it's faster? I would not have guessed that ever but I suppose
it's not that special. I'm probably the guy least using floating point
in the entire world.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.