Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 31 Jan 2013 18:36:22 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: NetNTLMv1

On 31 Jan, 2013, at 10:37 , Solar Designer <solar@...nwall.com> wrote:
> Attached is quick and still dirty implementation of the above approach
> for JtR.  Compared to the approach with maintaining a lookup table per
> challenge, this has lower memory needs and higher cracking speed, but
> (as currently implemented) it does the ~32k DES computations per C/R
> pair rather than per challenge.  It is possible to improve it to only do
> those computations per challenge, by temporarily maintaining a lookup
> table for each challenge (during loading only, and maybe only for the
> current challenge).
> 
> New speeds:
> 
> Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... DONE
> Many salts:     882291K c/s real, 882291K c/s virtual
> Only one salt:  7647K c/s real, 7647K c/s virtual
> 
> Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... (8xOMP) DONE
> Many salts:     910901K c/s real, 114005K c/s virtual
> Only one salt:  13025K c/s real, 1626K c/s virtual
> 
> Alexander

This is now committed, as well as SIMD support. New Bull figures:

Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [128/128 XOP intrinsics 8x]... DONE
Many salts:     315806K c/s real, 315806K c/s virtual
Only one salt:  28196K c/s real, 28196K c/s virtual

That's a poor many-salts figure, see below. But the non-SIMD OMP speed got a lot worse:

Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... DONE
Many salts:     880116K c/s real, 887389K c/s virtual
Only one salt:  7557K c/s real, 7557K c/s virtual

Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... (8xOMP) DONE
Many salts:     223689K c/s real, 39416K c/s virtual
Only one salt:  904681 c/s real, 159323 c/s virtual

This performance regression must be caused by my tweaking of OMP_SCALE and base MAX_KEYS_PER_CRYPT for an i7-3820. Benchmarks for that one at 3.60GHz:

Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [128/128 AVX intrinsics 12x]... DONE
Many salts:	914527K c/s real, 917585K c/s virtual
Only one salt:	44938K c/s real, 44938K c/s virtual

Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... DONE
Many salts:	1251M c/s real, 1255M c/s virtual
Only one salt:	9564K c/s real, 9564K c/s virtual

Note that the below is *four* cores, not eight.
Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... (4xOMP) DONE
Many salts:	1734M c/s real, 434310K c/s virtual
Only one salt:	28072K c/s real, 7035K c/s virtual

That's a remarkable difference. Not sure how to tweak it for both. Maybe just lower MAX_KEYS_PER_CRYPT until we see a sweet spot on Bull, and hope it does not make too much difference on the Intel.

Oh btw, here's a figure for Bull at four cores:

Benchmarking: NTLMv1 C/R MD4 DES (ESS MD5) [32/64]... (4xOMP) DONE
Many salts:     976125K c/s real, 244587K c/s virtual
Only one salt:  14336K c/s real, 3591K c/s virtual

That's better. Does this suggest any specific change I should do?

I haven't even tried doing OMP for SIMD. I reckon it can't do any good with current core. The NT2 format has the code needed, but it's defined out (BLOCK_LOOPS). Maybe it can be tweaked to do a little good.

magnum


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ