Date: Sun, 8 Jul 2012 12:34:51 +0530 From: Sayantan Datta <std2048@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Rotate and bitselect investigation On Sun, Jul 8, 2012 at 11:58 AM, Solar Designer <solar@...nwall.com> wrote: > Sayantan, > > On Sun, Jul 08, 2012 at 10:51:54AM +0530, Sayantan Datta wrote: > > I have investigated the rotate and bitselct issue on 7970. > > Thank you! > > > Both type of rotate(manual and inbuilt opencl function) use bitalign > > instruction. I investigated using rotate(x,(uint)30) and ((x << 30) | > ((x > > ) >> 2)). Also the values loaded in the bitalign instructions are exactly > > same except they operate on different registers. So you won't see any > > performance increase in this case. > > Sounds good. > > > However with bitselct the situation is different. The inbuilt function > uses > > an alien bfi instruction > > This is precisely what we expected. :-) > > > which I couldn't find anywhere in the docs. The > > manual version uses ixor and iand. > > So, any explanation why there's no measurable speedup (at least in my > tests) from using bitselect() in SHA-1's F in MSCash2? Is there some > kind of stall, so that the reduction in instruction count doesn't help? > Or is there somehow no such reduction (e.g., an extra move is added)? > > Alexander > Hi Alexander, There was a small typo in the optimized kernel. Apparently you didn't change from manual to bitselect in the SHA1_digest() function which is called in the 10K iteration. Here's the new results with 1/4 of the KPC you used in the previous benchmark. std2048@...l:~/bin/run$ ./john -te -fo=mscash2-opencl OpenCL platform 0: NVIDIA CUDA, 1 device(s). Using device 0: GeForce GTX 570 Compilation log: ptxas info : Compiling entry function 'PBKDF2' for 'sm_20' ptxas info : Function properties for PBKDF2 64 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads ptxas info : Used 59 registers, 160+0 bytes smem, 52 bytes cmem, 4 bytes cmem Optimal Work Group Size:128 Kernel Execution Speed (Higher is better):0.484821 Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE Raw: 30481 c/s real, 30481 c/s virtual std2048@...l:~/bin/run$ ./john -te -fo=mscash2-opencl -pla=1 OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s). Using device 0: Tahiti Optimal Work Group Size:256 Kernel Execution Speed (Higher is better):1.549856 Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE Raw: 99801 c/s real, 99296 c/s virtual Regards, Sayantan Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.