Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 2 Feb 2013 20:30:44 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: NetNTLMv1

On 2 Feb, 2013, at 16:25 , Solar Designer <solar@...nwall.com> wrote:
> On Fri, Feb 01, 2013 at 07:45:12AM +0400, Solar Designer wrote:
>> With a generic+OpenMP build, it is ~3150M c/s for one process (8
>> threads).  This puzzles me, because generic's MD4 computations are
>> slower, whereas the comparisons are not supposed to be faster since
>> OpenMP is only being made use of for the MD4s, not for comparisons, in
>> that code version.  So I would have expected its performance to be
>> around ~850M at "many salts" - same as I'm getting for one process with
>> the XOP build (on otherwise idle system).  I don't understand where a
>> further 4x speedup comes from.
> 
> I think I figured this out: generic+OpenMP uses much higher
> max_keys_per_crypt than SIMD-enabled non-OpenMP builds do.  Can you
> rework the latter to allow for increasing their max_keys_per_crypt?
> My gut feeling is that a value of around 0x100 will be optimal (need to
> make it a multiple of MMX_COEF and maybe MD4_SSE_PARA as appropriate for
> a given build, of course).

Yes I figured I should try that. In NT2 there is a BLOCK_LOOPS macro that is a multiplier for SIMD number of keys. That was for OMP experiments but same code can be used for a single thread loop. BTW we can actually get up to ~80M for NT2 with 2xOMP but I haven't had any success in making it ready for production use: Only hardcoded values will work. As soon as I turn any of it into run-time variables, the overhead eats the gain. This can probably be worked out. And this should apply to NTLMv1 and MSCHAPv2 too.

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ