john-dev - Re: bitslice SHA-256

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20150604003308.GA27249@openwall.com>
Date: Thu, 4 Jun 2015 03:33:09 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice SHA-256

On Wed, Jun 03, 2015 at 06:57:33PM -0400, Alain Espinosa wrote:
> I recall one thing. Bitslice SHA256 use only bitwise instructions, so it provides a big performance improvement on CPUs with only floating point bitwise operations, like AVX for example.

That was my thought from a few years ago, but I am not aware of CPUs
where those "floating point" bitwise operations are faster per-bit than
their narrower width "integer" equivalents.  On CPUs with AVX (as tested
with our bitslice DES code on Sandy Bridge, Haswell, and Bulldozer),
256-bit AVX is twice slower per instruction and roughly same speed per
bit as 128-bit AVX.  On Pentium 3, SSE is roughly 3 times slower per
instruction and 1.5 times slower per-bit than MMX.

I did not specifically test this on Ivy Bridge and Piledriver, but I
expect them to behave the same as Sandy Bridge and Bulldozer in this
respect.  Someone may test to make sure.

As to Haswell, indeed we avoid the issue by using AVX2 there.  When
using AVX2, it is roughly same speed per instruction and twice faster
per bit than 128-bit AVX, and roughly twice faster per instruction and
twice faster per bit than 256-bit AVX.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.