Date: Thu, 26 Apr 2012 00:11:32 +0300 From: Milen Rangelov <gat3way@...il.com> To: john-dev@...ts.openwall.com Subject: Re: New RAR OpenCL kernel Hello (and sorry for hijacking the thread). The values for LWS (worksize) and KPC (ndrange) seem quite too unoptimal to me. Worksize of 256 is just not healthy for such a kernel, for a number of reasons. I would stick to 64 (or 32 for nvidia), even hardcoded, no need to customize that at all...but well that's just my opinion :) NDRange of just 256 is just bad though. It's not enough to keep the CUs busy enough to 'hide' memory access latencies. It needs to be at least several thousands. You are underusing the GPU that way. OTOH well yes, I know higher NDRange with RAR kernel could be disastrous, could even lead to ASIC hangs and so on. IMO the RAR kernel is just about (a very fragile) balance between several factors. Also I just noticed the ALUBusy and ALUPacking expectations. I guess they are too optimistic, especially the ALUBusy one, it would never reach anywhere near 100%, in fact I think even 50-60% would be an excellent achievement...yeah, that bad. Well unless you think of some clever way to reduce branching and/or loops and some clever way to reduce GPR usage without resorting to shifting variables to __local memory. Writing a good performing RAR kernel is indeed a very complex task (not trying to overestimate that at all, it took me weeks of coding and profiling and I am still not happy with the results). The only thing being close somehow is the sha512-crypt kernel for AMD, still RAR is still more complex. On Wed, Apr 25, 2012 at 11:30 PM, magnum <john.magnum@...hmail.com> wrote: > On 04/25/2012 10:26 PM, magnum wrote: > > On 04/25/2012 02:38 PM, SAYANTAN DATTA wrote: > >> I tested your rar format on my 4890.Here's the result: > >> > >> Local worksize (LWS) 256, Global worksize (KPC) 256 > >> Benchmarking: RAR3 (6 characters) [OpenCL]... DONE > >> Raw: 64.2 c/s real, 64.2 c/s virtual > >> > >> Is it okay to have KPC 256? Seems a bit low.. > > > > I forgot that last question. No, I do not think 256 is too low. I can > > get max speed on GTX580 using any LWS >= 64. > > Sorry I misread this. LWS 256 is OK but KPC at the same can't be good. I > wonder how it ended up like that. > > Try explicitly saying KPC=0 for a benchmark output. Maybe also try > setting a lower LWS (and KPC=0) and see what happens. > > magnum > > Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.