Date: Fri, 29 May 2015 06:35:29 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: bitslice SHA-256 Hi, Alain posted this to john-users. I am moving it to john-dev as it's more appropriate in here. I've also uncompressed the file and converted it to Unix linefeeds. Alain got some pretty good speeds here, however the bitslice implementation is slower than his straightforward one (which is probably faster than what we have in JtR?) In a comment in this source file, Alain wrote: "So we can expect that bitslice SHA256 will be (79-62)/62 = 27% slower than normal SHA256" This is based on instruction count. And in a private e-mail to me Alain reported actual speeds, where the difference is much bigger. I guess it may be bigger because we're exceeding L1 code cache size. I recently suggested how to deal with that: keep the instruction stream size per cycle at no more than 16 bytes, so that it gets fetched from L2 cache fast enough to execute at full speed. This may be 3 5-byte AVX2 instructions, each one with a different 1-byte offset against one of 8 general-purpose registers, thereby giving us a window of 64 "virtual registers" that we can shift by occasional ADDs/SUBs to the GPRs. But this won't remove the 27% slowdown estimated from instruction counts. Unless we find a way to reduce the instruction count, bitslicing SHA-256 on this architecture is not worthwhile. Thank you for the contribution, Alain! Alexander View attachment "bs_sha256.c" of type "text/x-c" (32743 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.