Date: Wed, 25 Apr 2012 23:43:32 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: New RAR OpenCL kernel On 04/25/2012 11:11 PM, Milen Rangelov wrote: > The values for LWS (worksize) and KPC (ndrange) seem quite too unoptimal to > me. Worksize of 256 is just not healthy for such a kernel, for a number of > reasons. I would stick to 64 (or 32 for nvidia), even hardcoded, no need to > customize that at all...but well that's just my opinion :) Thanks, I'll hardcode it as 64 then - I got worse speed with 32 on nvidia. I spent some time in nvidia's profiler today and I did not use all SM's and threads or whatever they called it unless I used LWS <= 64 and GWS >= 1536. The funny thing is I got the same 4400 c/s anyway. It got better in theory (less suggestions from the tool) but in practice it stayed the same. For GTX580 I'm using 8192 now since higher figures don't make any difference. > NDRange of just 256 is just bad though. It's not enough to keep the CUs > busy enough to 'hide' memory access latencies. It needs to be at least > several thousands. You are underusing the GPU that way. With the older cards I've tested (like 9600GT and Cedar) it did not matter: I'm just seeing longer and longer runtime, and the same c/s (after a best gws at lws<<5 or something like that, lws being very low). Does this imply any specific problem with my kernel? magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.