Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 14 Mar 2015 07:03:42 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: interleaved bitslice? (was: bitslice MD*/SHA*, AVX2)

On Thu, Mar 12, 2015 at 08:37:01AM +0100, magnum wrote:
> On 2015-03-11 21:55, Solar Designer wrote:
> > solar@...l:~/md5slice$ gcc md5slice.c -o md5slice -Wall -s -O3 -fomit-frame-pointer -funroll-loops -DVECTOR -march=native
> > 
> > This gave "warning: always_inline function might not be inlinable" about
> > FF(), I(), H(), F(), add32r(), add32c(), add32() - but then it built
> > fine.  The speed is:
> 
> Solar,
> 
> While experimenting with this I noticed using a vector size of 32 but
> still compiling for AVX gave a slight boost (~5%). I assume this ends up
> similar to the interleaving we use in Jumbo, and is faster for the same
> reasons.

It might be, yes.  However, when I tried that with "gcc version 4.6.3
(Ubuntu/Linaro 4.6.3-1ubuntu5)", it produced scalar code (~10x slower).
I guess this optimization occurs only with newer gcc, perhaps with same
versions of gcc that are AVX2-capable.

> When trying that with a vector size of 64, I trigger an ICE.
> 
> md5slice.c: In function 'II.constprop':
> md5slice.c:331:27: internal compiler error: in emit_move_insn, at
> expr.c:3609
>  static MAYBE_INLINE3 void II(a, b, c, d, x, s, ac)
>                            ^
> 
> md5slice.c:331:27: internal compiler error: Abort trap: 6
> gcc: internal compiler error: Abort trap: 6 (program cc1)
> 
> 
> That's with gcc-4.9.2 on OSX and it happens with -mavx2 too. I get a
> similar but not identical ICE on well. Maybe this should be reported.

Yes, it'd be good to report this.  Will you, or should I?

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ