Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 13 Aug 2013 11:48:46 +0530
From: Sayantan Datta <>
Subject: Re: key-length for mask mode.


On Tue, Aug 13, 2013 at 10:59 AM, Michael Samuel <> wrote:

> I was doing some of my own playing with this kernel, and was thinking it
> would be better to pass a complete block (already padded), with these
> optimisations:
> - Only pass 15 integers, X[15] is always zero for single blocks, so pass
> it as a constant
> - Align X column-wise, eg X[(gsize*n)+gid].  This will make the kernel
> work on AVX nicely, and also helped heaps on my old nVidia GTX-9800. (I
> don't have a modern GPU that does OpenCL at the moment)
> Hard-coding other parts of X[] works too (which is what I assume you're
> doing) - for otp-md5 I created a kernel that passed X0 and X1 via global
> mem, then hardcoded the rest of the array, but that's because it was always
> 8 bytes of input.
> I haven't delved into the JtR source yet to see how easy this is to do, I
> was mostly just interested because I wanted to hear how well the Phi worked
> :)

You seem to be using global memory for X[15] , do you store them into local
registers or use them directly ? If you are using global memory directly
then I'm not sure we should do that for GPUs because they have very high
random access latency for Global memory fetch.

I have hard codes X[4-15] in my test kernel. There wasn't much speed bump
in my case but it reduced register pressure.  Most likely I'm not going
down this path because we need to support generic raw-md5 hashes.


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.