john-dev - Re: Shared find_best

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <fd5c251b698b3c8c434f740c8624af6b@smtp.hushmail.com>
Date: Wed, 10 Oct 2012 15:25:37 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Shared find_best_workgroup

On 10 Oct, 2012, at 14:49 , Claudio André <claudioandre.br@...il.com> wrote:
> Em 10-10-2012 05:20, magnum escreveu:
>> The current find_best function seem to work just fine for most other formats. For my slow kernels, I'm wondering if I should home in on GWS first, using a low LWS (like 32). Then, using that GWS, home in on LWS. I bet that might work better but it will still take too long. The only way to do it quicker is to only run the loop kernel during this testing and not a full crypt_all(). But we can't easily use that approach in a shared function - there will be varying requirements for preparation.
>> 
>> I'll do some googling (again). Someone ought to have this figured out already.
>> 
> 
> Just in case you find something, better to JtR, but my opinion is that (now):
> - we have something good enough.
> - the results could be not "the best possible". But the difference is small and acceptable.
> - test LWS by hand is easy. If someone really need to experiment, JtR offer:
> LWS=32 GWS=0 john -t ...
> LWS=64 GWS=0 john -t ...
> 
> So, i am happy with it (as is). Have you found some weakness that need to be fixed?

I mostly agree. But if we can find a better and/or quicker way to enumerate it, all the better. The random luser will just run it with no tweaking and think JtR is slow.

Programs like cRARk seem to do fine with whatever method they use. Hashcat use very fine-grained device-specific kernels so I guess good figures are hard-coded per device.

> BTW: looking at your commits, i can realize you are seeing more unpleasant situations than mine. I only saw very slow processing if i tried something weird (LWS=16, for example).

I have seen that some cards run office 2007/2010 significantly faster with a much higher LWS than the hard-coded (many cards probably run best at 1024), while others don't like higher values at all. But when I try using the shared function, it takes a lot of time or misses the optimum a lot, or both.

But it's not the end of the world. I'll give this up for a while and concentrate on other run-time tweaks: I'm currently experimenting with a way for the formats to pass stuff like "-DHASH_LOOP=1024" when building the kernel. This way things can be fully unrolled or otherwise optimised even when parameters are chosen at run-time.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.