john-dev - Re: 1.7.9-jumbo

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <4EEE0B60.4040507@hushmail.com>
Date: Sun, 18 Dec 2011 16:48:48 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: 1.7.9-jumbo

On 12/18/2011 04:19 PM, Solar Designer wrote:
> On Sun, Dec 18, 2011 at 06:48:33PM +0400, Solar Designer wrote:
>> Minimum:                        0.81602 real, 0.81602 virtual
> ...
>> Also, someone could want to identify the format that became 18% slower
>> and see if this is reproducible and if it can be avoided (in a future
>> version).

I added functionality to relbench so it shows which formats has the 
worst/best performance gain (corresponding to min and max), maybe that 
is just what you did too. My patch is not good code but I suggest you 
add such functionality, it's very very handy.

> It's CRC-32.  On 1.7.8-jumbo-8 we had:
>
> Benchmarking: CRC-32 [32/64]... DONE
> Many salts:     63225K c/s real, 63225K c/s virtual
> Only one salt:  28983K c/s real, 28696K c/s virtual
>
> 1.7.9-jumbo-5 gives only:
>
> Benchmarking: CRC-32 [32/64]... DONE
> Many salts:     51593K c/s real, 51593K c/s virtual
> Only one salt:  27557K c/s real, 27557K c/s virtual
>
> (same machine, same compiler, same make target, no load).
>
> With CRC-32 excluded, the minimum improves to:
>
> Minimum:                        0.87716 real, 0.87718 virtual
>
> and now it's NTLM, which is worrisome.  Old:
>
> Benchmarking: NT MD4 [128/128 X2 SSE2-16]... DONE
> Raw:    26734K c/s real, 26469K c/s virtual
>
> New:
>
> Benchmarking: NT MD4 [128/128 X2 SSE2-16]... DONE
> Raw:    23450K c/s real, 23218K c/s virtual
>
> Looks bad to me.  Unlike CRC-32, NTLM's performance actually matters.
> And yes, this is reproduced on my second pair of benchmark runs as well.

> Can someone look into the NTLM performance regression?  And maybe into
> others as well, but NTLM is the important one.

Reproducible or not, I enclose the complete diff between the two 
versions. It should really be faster now and anything else is just 
"compiler randomness" that we can't do much about. In all versions of 
set_key() (they are different for normal/utf-8/codepage), it should 
really be faster code.

Unless this performance drop come from another part of John?

> We gained NT2, which is slightly faster:
>
> Benchmarking: NT v2 [SSE2i 12x]... DONE
> Raw:    25731K c/s real, 25731K c/s virtual
>
> but still slower than what NT used to achieve in 1.7.8-jumbo-8.  And I
> am not familiar with other possible differences between these two (e.g.,
> are they the same or different in handling of non-ASCII)?

They should act exactly the same. NT2 is much faster then NT for 32-bit 
sse2i, and much slower for non-mmx/sse (but will handle lengths up to 
125 if memory serves me)

magnum

View attachment "NT_178j8-179j5.diff" of type "text/x-patch" (3162 bytes)

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.