Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 12 Mar 2015 08:37:01 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: interleaved bitslice? (was: bitslice MD*/SHA*, AVX2)

On 2015-03-11 21:55, Solar Designer wrote:
> solar@...l:~/md5slice$ gcc md5slice.c -o md5slice -Wall -s -O3 -fomit-frame-pointer -funroll-loops -DVECTOR -march=native
> 
> This gave "warning: always_inline function might not be inlinable" about
> FF(), I(), H(), F(), add32r(), add32c(), add32() - but then it built
> fine.  The speed is:

Solar,

While experimenting with this I noticed using a vector size of 32 but
still compiling for AVX gave a slight boost (~5%). I assume this ends up
similar to the interleaving we use in Jumbo, and is faster for the same
reasons.

When trying that with a vector size of 64, I trigger an ICE.

md5slice.c: In function 'II.constprop':
md5slice.c:331:27: internal compiler error: in emit_move_insn, at
expr.c:3609
 static MAYBE_INLINE3 void II(a, b, c, d, x, s, ac)
                           ^

md5slice.c:331:27: internal compiler error: Abort trap: 6
gcc: internal compiler error: Abort trap: 6 (program cc1)


That's with gcc-4.9.2 on OSX and it happens with -mavx2 too. I get a
similar but not identical ICE on well. Maybe this should be reported.

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ