Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 25 Apr 2012 23:43:32 +0200
From: magnum <>
Subject: Re: New RAR OpenCL kernel

On 04/25/2012 11:11 PM, Milen Rangelov wrote:
> The values for LWS (worksize) and KPC (ndrange) seem quite too unoptimal to
> me. Worksize of 256 is just not healthy for such a kernel, for a number of
> reasons. I would stick to 64 (or 32 for nvidia), even hardcoded, no need to
> customize that at all...but well that's just my opinion :)

Thanks, I'll hardcode it as 64 then - I got worse speed with 32 on
nvidia. I spent some time in nvidia's profiler today and I did not use
all SM's and threads or whatever they called it unless I used LWS <= 64
and GWS >= 1536. The funny thing is I got the same 4400 c/s anyway. It
got better in theory (less suggestions from the tool) but in practice it
stayed the same. For GTX580 I'm using 8192 now since higher figures
don't make any difference.

> NDRange of just 256 is just bad though. It's not enough to keep the CUs
> busy enough to 'hide' memory access latencies. It needs to be at least
> several thousands. You are underusing the GPU that way.

With the older cards I've tested (like 9600GT and Cedar) it did not
matter: I'm just seeing longer and longer runtime, and the same c/s
(after a best gws at lws<<5 or something like that, lws being very low).
Does this imply any specific problem with my kernel?


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.