Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 20 Jul 2011 23:32:18 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: cryptmd5cuda

Lukas -

Any comments on the below?  I proposed some trivial changes there, which
should result in some speedup - quite desirable for uses of your patches
in the KoreLogic contest (not necessarily by our team, but by others as
well).  So please make those changes sooner rather than later.  Also,
similar changes might apply to other patches of yours - please review
them for the same issues and apply the changes as necessary.

Have you tested your patches with lengthy cracking runs, on lots of
hashes loaded for cracking?  For cryptmd5cuda, please use KoreLogic's
last year contest's hashes list:

http://contest-2010.korelogic.com

Compare your results against those produced by a clean JtR build (CPU).

Thanks,

Alexander

On Wed, Jul 13, 2011 at 08:55:41PM +0400, Solar Designer wrote:
> I just took a look at john-1.7.7-cryptmd5cuda-2.diff (didn't try running
> it yet, though).  Here are some comments:
> 
> When convenient, please start to base your patches on the latest main
> JtR version, which would be 1.7.8 now (not 1.7.7 anymore).  This should
> be trivial since your patches are not invasive.
> 
> For 32-bit systems, you recommend the linux-x86-mmx make target, but it
> should be linux-x86-sse2 instead.  (Of course, this only affects other
> hash types, not the one you provide GPU support for.)
> 
> +#define MIN(a,b) (a)<(b)?(a):(b)
> +#define MAX(a,b) (a)>(b)?(a):(b)
> 
> I suggest that you add braces around the entire expressions as well, to
> avoid any surprises later.
> 
> +static const char md5_salt_prefix[] = "$1$";
> 
> This reminds me - how about supporting "$apr1$" as well?  (You may get
> the test vectors from MD5_fmt.c.)
> 
> +#define F(x, y, z) (x&y | ~x&z)
> +#define G(x, y, z) (x&z | y&~z)
> 
> These two may be optimized to:
> 
> #define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
> #define G(x, y, z) ((y) ^ ((z) & ((x) ^ (y))))
> 
> If the GPU doesn't have a single AND-NOT instruction, and I think it
> does not, this reduces the instruction count from 4 to 3.
> 
> +#define PLAINTEXT_LENGTH       15+1
> 
> Why +1?  You don't actually support passwords of length 16, do you?
> 
> In md5_block(), you don't take advantage of x[15] always being 0 (for
> password length up to 15 inclusive) - you can just substitute 0 there.
> Also, rather than maintain an MD5 context, you can load initial
> constants right into registers, and you can even do some precomputation.
> 
> ...you also have all those "i % 7" and similar checks, which as we had
> discussed you'd need to get rid of.  And you wouldn't need md5_digest(),
> where you compute the length in bits - all of those different lengths
> would be precomputed outside of the 1000 iteration loop.
> 
> Please take a closer look at JtR's MD5_std.c.  It has all of these
> optimizations I mentioned.  It looks like you started with unoptimized
> MD5 code and didn't look at JtR's more optimal code closely enough to
> spot and borrow these.
> 
> Start by replacing F() and G(), and replacing x[15] with 0.  Trivial
> changes, but should give some speedup - I'd expect 5% to 10%.
> 
> Thanks,
> 
> Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ