Date: Sun, 18 Dec 2011 19:19:45 +0400
From: Solar Designer <>
Subject: Re: 1.7.9-jumbo

On Sun, Dec 18, 2011 at 06:48:33PM +0400, Solar Designer wrote:
> Minimum:                        0.81602 real, 0.81602 virtual
> Also, someone could want to identify the format that became 18% slower
> and see if this is reproducible and if it can be avoided (in a future
> version).

It's CRC-32.  On 1.7.8-jumbo-8 we had:

Benchmarking: CRC-32 [32/64]... DONE
Many salts:     63225K c/s real, 63225K c/s virtual
Only one salt:  28983K c/s real, 28696K c/s virtual

1.7.9-jumbo-5 gives only:

Benchmarking: CRC-32 [32/64]... DONE
Many salts:     51593K c/s real, 51593K c/s virtual
Only one salt:  27557K c/s real, 27557K c/s virtual

(same machine, same compiler, same make target, no load).

With CRC-32 excluded, the minimum improves to:

Minimum:                        0.87716 real, 0.87718 virtual

and now it's NTLM, which is worrisome.  Old:

Benchmarking: NT MD4 [128/128 X2 SSE2-16]... DONE
Raw:    26734K c/s real, 26469K c/s virtual


Benchmarking: NT MD4 [128/128 X2 SSE2-16]... DONE
Raw:    23450K c/s real, 23218K c/s virtual

Looks bad to me.  Unlike CRC-32, NTLM's performance actually matters.
And yes, this is reproduced on my second pair of benchmark runs as well.

With NTLM excluded as well, the new minimum is:

Minimum:                        0.94607 real, 0.94937 virtual

And it is for "dynamic_10: md5($s.md5($s.$p)) :Many salts", which I
don't care much about.  And it's only 5%.

Can someone look into the NTLM performance regression?  And maybe into
others as well, but NTLM is the important one.

We gained NT2, which is slightly faster:

Benchmarking: NT v2 [SSE2i 12x]... DONE
Raw:    25731K c/s real, 25731K c/s virtual

but still slower than what NT used to achieve in 1.7.8-jumbo-8.  And I
am not familiar with other possible differences between these two (e.g.,
are they the same or different in handling of non-ASCII)?


