![]() |
|
Message-ID: <5075713C.2080109@gmail.com> Date: Wed, 10 Oct 2012 09:59:40 -0300 From: Claudio André <claudioandre.br@...il.com> To: john-dev@...ts.openwall.com Subject: Re: Shared find_best_workgroup BTW: looking at your commits, i can realize you are seeing more unpleasant situations than mine. I only saw very slow processing if i tried something weird (LWS=16, for example). Claudio Em 10-10-2012 09:49, Claudio André escreveu: > Em 10-10-2012 05:20, magnum escreveu: >> On 9 Oct, 2012, at 13:48 , Claudio André <claudioandre.br@...il.com> >> wrote: >>> Em 08-10-2012 19:06, magnum escreveu: >>>> On 8 Oct, 2012, at 23:35 , Claudio André >>>> <claudioandre.br@...il.com> wrote: >>>> You might be able to get away with just this: Change all >>>> "profilingEvent" to "NULL" except the two ones (per format) that >>>> enqueue crypt_kernel. This will measure the most important kernel >>>> so it might do the trick. I just tried it and it seems to work fine. >>> Good strategy. I will try it. >> For what it's worth I now implemented support for split kernels in >> the shared opencl_find_best_workgroup(). You can now either use >> profilingEvent for a single kernel, or use firstEvent and lastEvent >> for the first and last kernel of your split ones. Still, this did not >> give satisfactory results with my formats so while I committed this >> support I do not currently use it. >> >> In your case, you'd use firstEvent when enqueing prepare_kernel and >> lastEvent when enqueing final_kernel. And NULL for the looped kernel. >> In the else clause of your crypt_all, just use profilingEvent. But >> this would need to be tested on many devices: In my case, it was very >> beneficial for some GPUs and very detrimental for others - and >> besides, it takes time, sometimes lots of time. So I opted to stay >> with semi-fixed LWS. > Just my opinion here: > - find_best_workgroup has to get something acceptable. It does not > have to produce an optimal result. And, the final user can try some > values if he/she thinks our strategy is not good enough. > - LWS=32 or LWS=64 or LWS=128 usually give something fine to be used. > > >> The current find_best function seem to work just fine for most other >> formats. For my slow kernels, I'm wondering if I should home in on >> GWS first, using a low LWS (like 32). Then, using that GWS, home in >> on LWS. I bet that might work better but it will still take too long. >> The only way to do it quicker is to only run the loop kernel during >> this testing and not a full crypt_all(). But we can't easily use that >> approach in a shared function - there will be varying requirements >> for preparation. >> >> I'll do some googling (again). Someone ought to have this figured out >> already. >> > > Just in case you find something, better to JtR, but my opinion is that > (now): > - we have something good enough. > - the results could be not "the best possible". But the difference is > small and acceptable. > - test LWS by hand is easy. If someone really need to experiment, JtR > offer: > LWS=32 GWS=0 john -t ... > LWS=64 GWS=0 john -t ... > > So, i am happy with it (as is). Have you found some weakness that need > to be fixed? > > Claudio
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.