Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 1 Nov 2012 09:19:20 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: mscash2-opencl problems with GT650M

On 1 Nov, 2012, at 8:53 , Sayantan Datta <std2048@...il.com> wrote:

> Hi magnum,
> 
> On Thu, Nov 1, 2012 at 1:10 PM, magnum <john.magnum@...hmail.com> wrote:
> A proper fix would be to pick GWS depending on hardware (or perhaps depending on duration when probing for LWS). It would also be wise to split the kernel to avoid extensive durations. Maybe the best solution is doing both.
> 
> Thank you for this fix. 
> I would make a patch for GWS. How do you suggest I split the kernel? Based on work items or based on iteration count in the kernel?   

I have split all my kernels on iterations. Basically, instead of a 100,000x loop inside the kernel, I make eg. a 1,000x loop-kernel and call it 100 times from host code. I end up with at least three kernels and host code is like:

clEnqueue(kernel_init)
for (i = 0; i < 100; i++)
	clEnqueue(kernel_loop) // 1,000 iterations each
clEnqueue(kernel_finish)

The intermediate results are stored in global memory. No data is transferred to/from GPU in the loop - the pseudo code above is almost verbatim. I saw *no* performance drop from my splits, rather the opposite. When I split I very roughly target ~10 ms durations on Bull's cards, this also makes desktop response pretty good. Make it a #define though (see HASH_LOOPS in the RAR and Office formats).

We need to do this to all pbkdf2 kernels, and some others too.

magnum


[ CONTENT OF TYPE text/html SKIPPED ]

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ