Date: Wed, 10 Oct 2012 09:49:17 -0300 From: Claudio André <claudioandre.br@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Shared find_best_workgroup Em 10-10-2012 05:20, magnum escreveu: > On 9 Oct, 2012, at 13:48 , Claudio André <claudioandre.br@...il.com> wrote: >> Em 08-10-2012 19:06, magnum escreveu: >>> On 8 Oct, 2012, at 23:35 , Claudio André <claudioandre.br@...il.com> wrote: >>> You might be able to get away with just this: Change all "profilingEvent" to "NULL" except the two ones (per format) that enqueue crypt_kernel. This will measure the most important kernel so it might do the trick. I just tried it and it seems to work fine. >> Good strategy. I will try it. > For what it's worth I now implemented support for split kernels in the shared opencl_find_best_workgroup(). You can now either use profilingEvent for a single kernel, or use firstEvent and lastEvent for the first and last kernel of your split ones. Still, this did not give satisfactory results with my formats so while I committed this support I do not currently use it. > > In your case, you'd use firstEvent when enqueing prepare_kernel and lastEvent when enqueing final_kernel. And NULL for the looped kernel. In the else clause of your crypt_all, just use profilingEvent. But this would need to be tested on many devices: In my case, it was very beneficial for some GPUs and very detrimental for others - and besides, it takes time, sometimes lots of time. So I opted to stay with semi-fixed LWS. Just my opinion here: - find_best_workgroup has to get something acceptable. It does not have to produce an optimal result. And, the final user can try some values if he/she thinks our strategy is not good enough. - LWS=32 or LWS=64 or LWS=128 usually give something fine to be used. > The current find_best function seem to work just fine for most other formats. For my slow kernels, I'm wondering if I should home in on GWS first, using a low LWS (like 32). Then, using that GWS, home in on LWS. I bet that might work better but it will still take too long. The only way to do it quicker is to only run the loop kernel during this testing and not a full crypt_all(). But we can't easily use that approach in a shared function - there will be varying requirements for preparation. > > I'll do some googling (again). Someone ought to have this figured out already. > Just in case you find something, better to JtR, but my opinion is that (now): - we have something good enough. - the results could be not "the best possible". But the difference is small and acceptable. - test LWS by hand is easy. If someone really need to experiment, JtR offer: LWS=32 GWS=0 john -t ... LWS=64 GWS=0 john -t ... So, i am happy with it (as is). Have you found some weakness that need to be fixed? Claudio
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.