Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 15 Mar 2015 04:09:32 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: AVX2

magnum, all -

On our i7-4770K, before:

solar@...l:~/j/john-1.8.0.4.orig/src$ ../run/john -te -form=descrypt
Will run 8 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 128/128 AVX-16]... DONE
Many salts:     25087K c/s real, 3139K c/s virtual
Only one salt:  20375K c/s real, 2550K c/s virtual

After:

solar@...l:~/j/john-1.8.0.4-avx2/src$ ../run/john -te -form=descrypt
Will run 8 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 256/256 AVX2-16]... DONE
Many salts:     43489K c/s real, 5443K c/s virtual
Only one salt:  30002K c/s real, 3755K c/s virtual

For comparison, 256-bit AVX (rather than AVX2) resulted in slowdown:

solar@...l:~/j/john-1.8.0.4-hack/src$ ../run/john -te -form=descrypt
Will run 8 OpenMP threads
Benchmarking: descrypt, traditional crypt(3) [DES 256/256 AVX-16]... DONE
Many salts:     21518K c/s real, 2699K c/s virtual
Only one salt:  17602K c/s real, 2203K c/s virtual

This previously discouraged me from trying AVX2 for bitslice DES - after
all, it's similar instructions and for 128-bit there's no longer any
performance impact of the "floating-point" loads/stores and bitwise
operations (in fact, their opcodes are one byte shorter, so gcc's -Os
uses them in lieu of the SSE2+ integer ones).  Well, I should have tried
AVX2 sooner.

Now I need to implement this cleanly and commit it into the core tree.

Oh, and we'll need AVX2 detection in CPU_detect().

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ