Date: Sat, 14 Mar 2015 22:49:30 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: interleaved bitslice? On 2015-03-14 20:01, Solar Designer wrote: > On Thu, Mar 12, 2015 at 08:37:01AM +0100, magnum wrote: >> On 2015-03-11 21:55, Solar Designer wrote: >>> solar@...l:~/md5slice$ gcc md5slice.c -o md5slice -Wall -s -O3 -fomit-frame-pointer -funroll-loops -DVECTOR -march=native >>> >>> This gave "warning: always_inline function might not be inlinable" about >>> FF(), I(), H(), F(), add32r(), add32c(), add32() - but then it built >>> fine. The speed is: >> >> Solar, >> >> While experimenting with this I noticed using a vector size of 32 but >> still compiling for AVX gave a slight boost (~5%). I assume this ends up >> similar to the interleaving we use in Jumbo, and is faster for the same >> reasons. > > I've just tested this with gcc 4.9.2 on Linux, and the generated code is > "floating-point" 256-bit AVX. So this is not interleaving. > And yet it's faster? I would not have guessed that ever but I suppose it's not that special. I'm probably the guy least using floating point in the entire world. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.