john-dev - bitslice SHA-256

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Date: Fri, 29 May 2015 06:35:29 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: bitslice SHA-256

Hi,

Alain posted this to john-users.  I am moving it to john-dev as it's
more appropriate in here.  I've also uncompressed the file and converted
it to Unix linefeeds.

Alain got some pretty good speeds here, however the bitslice
implementation is slower than his straightforward one (which is probably
faster than what we have in JtR?)  In a comment in this source file,
Alain wrote:

"So we can expect that bitslice SHA256 will be (79-62)/62 = 27% slower
than normal SHA256"

This is based on instruction count.  And in a private e-mail to me Alain
reported actual speeds, where the difference is much bigger.  I guess it
may be bigger because we're exceeding L1 code cache size.  I recently
suggested how to deal with that: keep the instruction stream size per
cycle at no more than 16 bytes, so that it gets fetched from L2 cache
fast enough to execute at full speed.  This may be 3 5-byte AVX2
instructions, each one with a different 1-byte offset against one of 8
general-purpose registers, thereby giving us a window of 64 "virtual
registers" that we can shift by occasional ADDs/SUBs to the GPRs.  But
this won't remove the 27% slowdown estimated from instruction counts.
Unless we find a way to reduce the instruction count, bitslicing SHA-256
on this architecture is not worthwhile.

Thank you for the contribution, Alain!

Alexander

View attachment "bs_sha256.c" of type "text/x-c" (32743 bytes)

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.