john-dev - Re: Shared find_best

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 10 Oct 2012 09:59:40 -0300
From: Claudio André <claudioandre.br@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: Shared find_best_workgroup

BTW: looking at your commits, i can realize you are seeing more 
unpleasant situations than mine. I only saw very slow processing if i 
tried something weird (LWS=16, for example).

Claudio


Em 10-10-2012 09:49, Claudio André escreveu:
> Em 10-10-2012 05:20, magnum escreveu:
>> On 9 Oct, 2012, at 13:48 , Claudio André <claudioandre.br@...il.com> 
>> wrote:
>>> Em 08-10-2012 19:06, magnum escreveu:
>>>> On 8 Oct, 2012, at 23:35 , Claudio André 
>>>> <claudioandre.br@...il.com> wrote:
>>>> You might be able to get away with just this: Change all 
>>>> "profilingEvent" to "NULL" except the two ones (per format) that 
>>>> enqueue crypt_kernel. This will measure the most important kernel 
>>>> so it might do the trick. I just tried it and it seems to work fine.
>>> Good strategy. I will try it.
>> For what it's worth I now implemented support for split kernels in 
>> the shared opencl_find_best_workgroup(). You can now either use 
>> profilingEvent for a single kernel, or use firstEvent and lastEvent 
>> for the first and last kernel of your split ones. Still, this did not 
>> give satisfactory results with my formats so while I committed this 
>> support I do not currently use it.
>>
>> In your case, you'd use firstEvent when enqueing prepare_kernel and 
>> lastEvent when enqueing final_kernel. And NULL for the looped kernel. 
>> In the else clause of your crypt_all, just use profilingEvent. But 
>> this would need to be tested on many devices: In my case, it was very 
>> beneficial for some GPUs and very detrimental for others - and 
>> besides, it takes time, sometimes lots of time. So I opted to stay 
>> with semi-fixed LWS.
> Just my opinion here:
> - find_best_workgroup has to get something acceptable. It does not 
> have to produce an optimal result. And, the final user can try some 
> values if he/she thinks our strategy is not good enough.
> - LWS=32 or LWS=64 or LWS=128 usually give something fine to be used.
>
>
>> The current find_best function seem to work just fine for most other 
>> formats. For my slow kernels, I'm wondering if I should home in on 
>> GWS first, using a low LWS (like 32). Then, using that GWS, home in 
>> on LWS. I bet that might work better but it will still take too long. 
>> The only way to do it quicker is to only run the loop kernel during 
>> this testing and not a full crypt_all(). But we can't easily use that 
>> approach in a shared function - there will be varying requirements 
>> for preparation.
>>
>> I'll do some googling (again). Someone ought to have this figured out 
>> already.
>>
>
> Just in case you find something, better to JtR, but my opinion is that 
> (now):
> - we have something good enough.
> - the results could be not "the best possible". But the difference is 
> small and acceptable.
> - test LWS by hand is easy. If someone really need to experiment, JtR 
> offer:
> LWS=32 GWS=0 john -t ...
> LWS=64 GWS=0 john -t ...
>
> So, i am happy with it (as is). Have you found some weakness that need 
> to be fixed?
>
> Claudio
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.