Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 13 Aug 2013 11:48:46 +0530
From: Sayantan Datta <std2048@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: key-length for mask mode.

Hi,

On Tue, Aug 13, 2013 at 10:59 AM, Michael Samuel <mik@...net.net> wrote:

> I was doing some of my own playing with this kernel, and was thinking it
> would be better to pass a complete block (already padded), with these
> optimisations:
>
> - Only pass 15 integers, X[15] is always zero for single blocks, so pass
> it as a constant
> - Align X column-wise, eg X[(gsize*n)+gid].  This will make the kernel
> work on AVX nicely, and also helped heaps on my old nVidia GTX-9800. (I
> don't have a modern GPU that does OpenCL at the moment)
>
> Hard-coding other parts of X[] works too (which is what I assume you're
> doing) - for otp-md5 I created a kernel that passed X0 and X1 via global
> mem, then hardcoded the rest of the array, but that's because it was always
> 8 bytes of input.
>
> I haven't delved into the JtR source yet to see how easy this is to do, I
> was mostly just interested because I wanted to hear how well the Phi worked
> :)


You seem to be using global memory for X[15] , do you store them into local
registers or use them directly ? If you are using global memory directly
then I'm not sure we should do that for GPUs because they have very high
random access latency for Global memory fetch.

I have hard codes X[4-15] in my test kernel. There wasn't much speed bump
in my case but it reduced register pressure.  Most likely I'm not going
down this path because we need to support generic raw-md5 hashes.

Regards,
Sayantan

[ CONTENT OF TYPE text/html SKIPPED ]

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ