john-dev - Re: interleaved bitslice? (was: bitslice MD*/SHA*, AVX2)

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <a069d652bb829602864c871221a76f4b@smtp.hushmail.com>
Date: Thu, 12 Mar 2015 08:37:01 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: interleaved bitslice? (was: bitslice MD*/SHA*, AVX2)

On 2015-03-11 21:55, Solar Designer wrote:
> solar@...l:~/md5slice$ gcc md5slice.c -o md5slice -Wall -s -O3 -fomit-frame-pointer -funroll-loops -DVECTOR -march=native
> 
> This gave "warning: always_inline function might not be inlinable" about
> FF(), I(), H(), F(), add32r(), add32c(), add32() - but then it built
> fine.  The speed is:

Solar,

While experimenting with this I noticed using a vector size of 32 but
still compiling for AVX gave a slight boost (~5%). I assume this ends up
similar to the interleaving we use in Jumbo, and is faster for the same
reasons.

When trying that with a vector size of 64, I trigger an ICE.

md5slice.c: In function 'II.constprop':
md5slice.c:331:27: internal compiler error: in emit_move_insn, at
expr.c:3609
 static MAYBE_INLINE3 void II(a, b, c, d, x, s, ac)
                           ^

md5slice.c:331:27: internal compiler error: Abort trap: 6
gcc: internal compiler error: Abort trap: 6 (program cc1)

That's with gcc-4.9.2 on OSX and it happens with -mavx2 too. I get a
similar but not identical ICE on well. Maybe this should be reported.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.