john-dev - Re: interleaved bitslice? (was: bitslice MD*/SHA*, AVX2)

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150314190150.GA14364@openwall.com>
Date: Sat, 14 Mar 2015 22:01:50 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: interleaved bitslice? (was: bitslice MD*/SHA*, AVX2)

On Thu, Mar 12, 2015 at 08:37:01AM +0100, magnum wrote:
> On 2015-03-11 21:55, Solar Designer wrote:
> > solar@...l:~/md5slice$ gcc md5slice.c -o md5slice -Wall -s -O3 -fomit-frame-pointer -funroll-loops -DVECTOR -march=native
> > 
> > This gave "warning: always_inline function might not be inlinable" about
> > FF(), I(), H(), F(), add32r(), add32c(), add32() - but then it built
> > fine.  The speed is:
> 
> Solar,
> 
> While experimenting with this I noticed using a vector size of 32 but
> still compiling for AVX gave a slight boost (~5%). I assume this ends up
> similar to the interleaving we use in Jumbo, and is faster for the same
> reasons.

I've just tested this with gcc 4.9.2 on Linux, and the generated code is
"floating-point" 256-bit AVX.  So this is not interleaving.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.