Date: Sat, 26 Jan 2013 23:46:23 +0100 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Proposed optimizations to pwsafe Thanks Milen. Brian, to use this selectively in our OpenCL formats, you should do this in the kernel: #include "opencl_device_info.h" #if gpu_amd(DEVICE_INFO) #define USE_BITSELECT #endif ... #ifdef USE_BITSELECT #define Ch(x,y,z) (bitselect(z,y,x)) ... #else #define Ch(x, y, z) (z ^ (x & (y ^ z))) ... #endif Just so you don't spend time inventing wheels. We probably need to check all our SHA-2 OpenCL kernels for this. I thought we already did, but I have just copy'n'pasted code (as always :) so my formats probably lack it if the "templates" do. magnum On 26 Jan, 2013, at 23:35 , Milen Rangelov <gat3way@...il.com> wrote: > Just a side note, I just had a look at your opencl pwsafe code and there are obvious optimizations that can be done. Some are minor, but the most important is the following. You have this: > > > #define Ch(x, y, z) (z ^ (x & (y ^ z))) > #define Maj(x, y, z) ((y & z) | (x & (y | z))) > > If you replace those by: > > #define Ch(x,y,z) (bitselect(z,y,x)) > #define Maj(x,y,z) (bitselect(y, x,(z^y))) > > You are effectively using just 1 ALU operation per Ch as compared to 3 and 2 ALU ops per Maj as compared to 4. > > You've got 80 steps per SHA256 block operation, so you save 360 ALU ops per SHA256. bitselect is mapped to the hardware instruction BFI_INT. This is applicable to amd hardware only, not nvidia. > > Hope that helps :) > >
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.