Date: Wed, 19 Aug 2015 23:52:44 +0800 From: Lei Zhang <zhanglei.april@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Formats using non-SIMD SHA2 implementations On Aug 18, 2015, at 4:39 PM, magnum <john.magnum@...hmail.com> wrote: > > BTW both 7z and rar3 has a property that makes the entire message length for all rounds guaranteed even divisible by 64. Current RAR3 kernel takes that opportunity to shuffle data much faster (trading global memory but it's still a lot faster) like this: > > 1. Prepare a buffer not of 2*64 but of smallest size that ends exactly at (len % 64 == 0). > 2. When copying a limb, do it 16 x 32-bits at a time (btw this will benefit SIMD scattering even more than OpenCL!) from that aligned buffer. > 3. After each step 2, just update the "rounds" bytes within the buffer prepared in step 1. > > The above was the final idea that made our RAR3-opencl as fast as cRARk, which was my benchmark goal at the time. I don't really understand what you said here... Anyway, I applied the optimizations we discussed so far to 7z, including sorting passwords by length (Jim's idea) and using double-buffer (magnum's idea). I compared the performance of the optimized SIMD code, my previous naive SIMD code and the scalar code. See the figures below. (Specs: AVX2, 8-HT) OpenMP disabled: [scalar] Raw: 12.7 c/s real, 12.7 c/s virtual [naive SIMD] Raw: 48.2 c/s real, 48.2 c/s virtual [optimized SIMD] Raw: 48.6 c/s real, 48.6 c/s virtual OpenMP enabled: [scalar] Raw: 49.1 c/s real, 6.4 c/s virtual [naive SIMD] Raw: 72.3 c/s real, 9.3 c/s virtual [optimized SIMD] Raw: 201 c/s real, 26.7 c/s virtual As you can see, single-threaded performance of the optimized SIMD code shows no benefit, but multi-threaded performance is much better. The code becomes much more unreadable than before, but I think it's worth it :) Lei
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.