Date: Wed, 13 Jul 2011 20:55:41 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: cryptmd5cuda Lukas, I just took a look at john-1.7.7-cryptmd5cuda-2.diff (didn't try running it yet, though). Here are some comments: When convenient, please start to base your patches on the latest main JtR version, which would be 1.7.8 now (not 1.7.7 anymore). This should be trivial since your patches are not invasive. For 32-bit systems, you recommend the linux-x86-mmx make target, but it should be linux-x86-sse2 instead. (Of course, this only affects other hash types, not the one you provide GPU support for.) +#define MIN(a,b) (a)<(b)?(a):(b) +#define MAX(a,b) (a)>(b)?(a):(b) I suggest that you add braces around the entire expressions as well, to avoid any surprises later. +static const char md5_salt_prefix = "$1$"; This reminds me - how about supporting "$apr1$" as well? (You may get the test vectors from MD5_fmt.c.) +#define F(x, y, z) (x&y | ~x&z) +#define G(x, y, z) (x&z | y&~z) These two may be optimized to: #define F(x, y, z) ((z) ^ ((x) & ((y) ^ (z)))) #define G(x, y, z) ((y) ^ ((z) & ((x) ^ (y)))) If the GPU doesn't have a single AND-NOT instruction, and I think it does not, this reduces the instruction count from 4 to 3. +#define PLAINTEXT_LENGTH 15+1 Why +1? You don't actually support passwords of length 16, do you? In md5_block(), you don't take advantage of x always being 0 (for password length up to 15 inclusive) - you can just substitute 0 there. Also, rather than maintain an MD5 context, you can load initial constants right into registers, and you can even do some precomputation. ...you also have all those "i % 7" and similar checks, which as we had discussed you'd need to get rid of. And you wouldn't need md5_digest(), where you compute the length in bits - all of those different lengths would be precomputed outside of the 1000 iteration loop. Please take a closer look at JtR's MD5_std.c. It has all of these optimizations I mentioned. It looks like you started with unoptimized MD5 code and didn't look at JtR's more optimal code closely enough to spot and borrow these. Start by replacing F() and G(), and replacing x with 0. Trivial changes, but should give some speedup - I'd expect 5% to 10%. Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.