Date: Wed, 19 Aug 2015 18:57:58 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Formats using non-SIMD SHA2 implementations On 2015-08-19 17:52, Lei Zhang wrote: > On Aug 18, 2015, at 4:39 PM, magnum <john.magnum@...hmail.com> wrote: >> >> BTW both 7z and rar3 has a property that makes the entire message length for all rounds guaranteed even divisible by 64. Current RAR3 kernel takes that opportunity to shuffle data much faster (trading global memory but it's still a lot faster) like this: >> >> 1. Prepare a buffer not of 2*64 but of smallest size that ends exactly at (len % 64 == 0). >> 2. When copying a limb, do it 16 x 32-bits at a time (btw this will benefit SIMD scattering even more than OpenCL!) from that aligned buffer. >> 3. After each step 2, just update the "rounds" bytes within the buffer prepared in step 1. >> >> The above was the final idea that made our RAR3-opencl as fast as cRARk, which was my benchmark goal at the time. > > I don't really understand what you said here... I did it again now for 7z-opencl, see 5d8a78c. The boost compared to Dhiru's naive/RFC implementation was 65% and we're now 10-15% faster than oclHashcat. > Anyway, I applied the optimizations we discussed so far to 7z, including sorting passwords by length (Jim's idea) and using double-buffer (magnum's idea). I compared the performance of the optimized SIMD code, my previous naive SIMD code and the scalar code. See the figures below. (Specs: AVX2, 8-HT) > > OpenMP disabled: > > [scalar] > Raw: 12.7 c/s real, 12.7 c/s virtual > [naive SIMD] > Raw: 48.2 c/s real, 48.2 c/s virtual > [optimized SIMD] > Raw: 48.6 c/s real, 48.6 c/s virtual Good stuff. I bet real-life boost is a lot better from the last optimization. The test vectors (of only length 8, right?) just don't show it. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.