Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 5 Aug 2012 02:08:40 +0200
From: Lukas Odzioba <lukas.odzioba@...il.com>
To: john-dev@...ts.openwall.com
Subject: Lukas status report #14

Last week:
I spent some time on testing code before contest to avoid any
"problems", and also helped a bit in password cracking.
I worked on making md5crypt-opencl code faster. For my 5850 it is 2x
faster now (150k -> 360k - still slow it should be ~1M)
For gtx460: ~500k
For gtx570: ~1M c/s
For 7970: ~2.2M c/s

There is still room for improvement especially for amd(50% of
hashcat's speed), however speed for nvidia's is quite ok (90% of
hashcat's speed).
Vectorization might be tricky, but it will add a lot of speed for amd cards.
I got rid of using char in kernels, this solved
cl_khr_byte_addressable_store problem for a good, using uints only had
a good performance impact.
Now adding data to buffers is realized like that:
__constant uint M0[] = { 0, 8, 16, 24 };
__constant uint M1[] = { 31, 23, 15, 7 };
__constant uint MSK[]={0x000000ff,0x0000ffff,0x00ffffff,0xffffffff};
void ctx_updateconstant(__private md5_ctx * ctx,__constant uint *str,
uint slen, uint * ctx_buflen)
{
	uint *t=ctx->buffer;
	uint *len=ctx_buflen;
	int i, k, w = (slen+3) / 4;
	for (i = 0; i < w; i++) {//stuff from Hogwart, be carefull
		k=slen<4?slen:4;
		t[*len / 4] |= (str[i]&MSK[k-1])<< M0[*len & 3];
		t[(*len / 4) + 1] |= ((str[i]&MSK[k-1])>> M1[*len & 3])>>1; //we do
not want hurt kitties by x>>32
		*len += k;
		slen -= k;
	}
}
This simply adds str to ctx->buffer but all is done on 32bit uint,
taking care of dividing input uints when needed ("abc" + "de", d is
added to abc, and e creates new int)   - nothing big.
If anyone know how we could do it faster comments are welcomed.
Masking (MSK) is optional, this for loop could be replaced by some
while with slen.


There is also a bad news. I have a bug that is pretty hard to locate.
Code works only for passwords shorter that 8 chars.
I do not see a reason for that, I must compare step by step execution
with comparision to other code.
I'll make proper fixes before end of gsoc.

This week:
des-crypt on gpu.
Any ideas, links, filenames, pdfs to read are welcomed.

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ