Date: Tue, 13 Aug 2013 11:48:46 +0530 From: Sayantan Datta <std2048@...il.com> To: john-dev@...ts.openwall.com Subject: Re: key-length for mask mode. Hi, On Tue, Aug 13, 2013 at 10:59 AM, Michael Samuel <mik@...net.net> wrote: > I was doing some of my own playing with this kernel, and was thinking it > would be better to pass a complete block (already padded), with these > optimisations: > > - Only pass 15 integers, X is always zero for single blocks, so pass > it as a constant > - Align X column-wise, eg X[(gsize*n)+gid]. This will make the kernel > work on AVX nicely, and also helped heaps on my old nVidia GTX-9800. (I > don't have a modern GPU that does OpenCL at the moment) > > Hard-coding other parts of X works too (which is what I assume you're > doing) - for otp-md5 I created a kernel that passed X0 and X1 via global > mem, then hardcoded the rest of the array, but that's because it was always > 8 bytes of input. > > I haven't delved into the JtR source yet to see how easy this is to do, I > was mostly just interested because I wanted to hear how well the Phi worked > :) You seem to be using global memory for X , do you store them into local registers or use them directly ? If you are using global memory directly then I'm not sure we should do that for GPUs because they have very high random access latency for Global memory fetch. I have hard codes X[4-15] in my test kernel. There wasn't much speed bump in my case but it reduced register pressure. Most likely I'm not going down this path because we need to support generic raw-md5 hashes. Regards, Sayantan Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.