john-dev - Re: Judy array

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5fa3ec9a2e251fcc76a8a5bbdd3876cf@smtp.hushmail.com>
Date: Tue, 15 Sep 2015 19:32:06 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Judy array

On 2015-09-15 17:22, Solar Designer wrote:
> On Tue, Sep 15, 2015 at 09:45:51AM +0200, magnum wrote:
>> On 2015-09-15 03:03, Solar Designer wrote:
>>> magnum - testing this stuff, I see that pot sync is a major bottleneck.
>>
>> Does it look like pot sync is slower than processing "our own" cracks?
>
> Yes, it's a lot slower than processing of the process' own cracks.
> However, I didn't compare it against processing of john.pot records on
> loading.
>
>> It shouldn't, it uses all the same gears in the box. Of course, with
>> -fork=8 it will spend 8x more time processing cracks,
>
> It appeared way worse than 8x in my testing.  With a large pot file
> buffer, the processes would become unresponsive for tens of seconds and
> would move way down in "top", leaving idle CPU time (weird).  Oh, maybe
> that's because you acquire a lock on the pot file while reading it?
> You shouldn't need to: you may instead lock, save the file size in a
> variable, unlock, and then read up to the recorded size (it'd be a whole
> number of pot file records then).  Or you may read without locking at
> all, as long as you ignore the last partial line (easy to detect: if
> it's not containing a colon, it's not complete enough for pot sync).

This crossed my mind too today. Aquiring a read lock is a Bad Idea[tm] 
if I ever had one. I will look into that.

>> but in real-life it's not a problem. This is a contrived test.
>
> A contrived test, yes, but if we manage to handle even contrived cases
> like this well, we might end up improving handling of more common cases
> as well.

I did not really mean otherwise. But *completely* disabling pot sync for 
unsalted hashes would have some serious drawbacks (eg. 14 processes 
chugging along with incremental mode forever even though all hashes were 
already cracked).

Anyway: I was probably using a too good computer for my tests so far, 
but one thing I saw clear: The one option "ReloadAtCrack = Y" is by far 
the worst offender here. I have committed a change so it defaults to 
being disabled. Even if read locks etc. can mitigate some of the 
regression, that option is overkill: The remaining options are the 
important ones.

Another thing is that -enc:raw (which you did not mention using at all) 
will speed things up considerably. Maybe I should give up and 
comment-out the default "DefaultInternalEncoding" line in john.conf. It 
would have to come bundled with a bunch of documentation changes too though.

I also committed all your changes (except john.conf), with the prefetch 
stuff wrapped in "#ifdef CRACKER_PREFETCH" (which is not defined).

Furthermore I committed a "best64" ruleset from Hashcat. The rules using 
unsupported commands were commented out, and one "f" rule was added to 
make it exactly 64.

Finally I re-introduced a source() in MD4 and NT formats. I will do so 
for the SHA formats too.

All the above just passed the build-bots' tests, so I'm committing to 
bleeding right now.

magnum
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.