Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 16 May 2011 15:49:47 +0200
From: magnum <rawsmooth@...dband.net>
To: john-dev@...ts.openwall.com
Subject: Re: binary hashes and BINARY_SIZE

On 2011-03-29 21:11, Alain Espinosa wrote:
> On 3/29/11, Solar Designer<solar@...nwall.com>  wrote:
>> ... moving some code from cmp_one() to cmp_exact().  Alain?  Many
>> other "formats" are similarly not optimized in this respect.  Not that I
>> care much (with modern RAM sizes), but Alain started to compare memory
>> usage of different tools here:
>> http://www.hashsuite.webuda.com/index.php?p=1_6 ;-)
>> JtR's memory usage at NTLM will grow by 4 MB or 8 MB with the large hash
>> tables patch, but we can reduce it by 8 MB or 12 MB with the BINARY_SIZE
>> optimization.
>
> I make a rapid read and do not fully understand how john works. When i
> have some free time i read again and change the NTLM behavior. I think
> this 'john internal documentation' is very useful for john developers
> when starting a new format. Add it to the wiki or a 'How to support a
> new format'?
>
> I think memory usage/managment make an appreciable influence in
> performance which a large number of hashes.

As the current code only use 4 bytes for cmp_all() and cmp_one() I'm 
pretty sure a BINARY_SIZE of just 4 is plenty enough. cmp_exact() would 
have to be re-written so it creates a full NT hash from scratch and 
compares with that. This can use a copy of the (full) generic code as it 
doesn't need to be fast.

I had a go at re-arranging the current cmp_one() and cmp_exact in a way 
I believe is "more proper" (for the current binary size). I thought this 
wouldn't affect performance in any direction (as cmp_all() was not 
affected) but for some reason my benchmark dropped 5%. It must be one of 
those random how-stuff-ends-up cases.

Anyway, I enclose the code in case it helps someone think. I started to 
look at x86-64.S but that's way over my head. I could do it for 
crypt_all_generic as an example. This would be dead simple, just store b 
(instead of a, b, c, d) in the output array, making it a quarter of its 
current size. The SSE and MMX versions would need more work I guess, 
re-designing the buffers.

BTW I was experimenting the other day with defaulting to the old 
"non-fully-Unicode" crypt_all assembly code, and only use the new code 
when needed, but that too just made a performance drop, coincidental I 
presume.

magnum

static int cmp_all(void *binary, int count)
{
	unsigned int b = ((unsigned int *)binary)[1];
	unsigned int i = 0;

#if defined(NT_X86_64)
	for(; i < (NT_NUM_KEYS / 8); i++)
		if(b == output8x[i * 32 +  8] || b == output8x[i * 32 +  9] ||
		   b == output8x[i * 32 + 10] || b == output8x[i * 32 + 11] ||
		   b == output8x[i * 32 + 12] || b == output8x[i * 32 + 13] ||
		   b == output8x[i * 32 + 14] || b == output8x[i * 32 + 15])
			return 1;

#elif defined(NT_SSE2)
	unsigned int pos=4;
	
	for(; i < NT_NUM_KEYS1; i++, pos += 16)
		if(b == output4x[pos]     || b == output4x[pos + 1] ||
		   b == output4x[pos + 2] || b == output4x[pos + 3])
			return 1;
	i = 1;
	for(; i < NT_NUM_KEYS4; i += 4)
		if(b == output1x[i])
			return 1;
#else
	for(; i < NT_NUM_KEYS; i++)
		if(b == output1x[i * 4 + 1])
			return 1;
#endif
	
	return 0;
}

static int cmp_one(void * binary, int index)
{
	unsigned int b = ((unsigned int *)binary)[1];
#if defined(NT_X86_64)
	return (b == output8x[32 * (index >> 3) + 8 + index % 8]);
#elif defined(NT_SSE2)
	if(index<NT_NUM_KEYS4)
		return (b == output4x[16 * (index >> 2) + 4 + index % 4]);
	else
		return (b == output1x[(index - NT_NUM_KEYS4) * 4 + 1]);
#else
	return (b == output1x[(index << 2) + 1]);
#endif
}

static int cmp_exact(char *source, int index)
{
	unsigned int *t=(unsigned int *)get_binary(source);
	unsigned int a;
	unsigned int b;
	unsigned int c;
	unsigned int d;
	
	unsigned int * buffer;
	int pos1;
	int pos2;
	int pos3;
	
#if defined(NT_X86_64)
	int temp;
	buffer=nt_buffer8x;
	
	temp=32*(index>>3)+index%8;
	
	a=output8x[temp];
	b=output8x[temp+8];
	c=output8x[temp+16];
	d=output8x[temp+24];
	
	pos1=24+index%8+128*(index>>3);
	pos2=64+pos1;
	pos3=32+pos1;
#elif defined(NT_SSE2)
	int temp;
	
	if(index<NT_NUM_KEYS4)
	{
		buffer=nt_buffer4x;
		
		temp=16*(index>>2)+index%4;
		
		a=output4x[temp];
		b=output4x[temp+4];
		c=output4x[temp+8];
		d=output4x[temp+12];
		
		pos1=12+index%4+64*(index>>2);
		pos2=32+pos1;
		pos3=16+pos1;
	}
	else
	{
		buffer=nt_buffer1x;
		
		temp=4*(index-NT_NUM_KEYS4);
		
		a=output1x[temp];
		b=output1x[temp+1];
		c=output1x[temp+2];
		d=output1x[temp+3];
		
		pos1=3+4*temp;
		pos2=8+pos1;
		pos3=4+pos1;
	}
#else
	buffer=nt_buffer1x;
	
	a=output1x[(index<<2)];
	b=output1x[(index<<2)+1];
	c=output1x[(index<<2)+2];
	d=output1x[(index<<2)+3];
	
	pos1=(index<<4)+3;
	pos2=8+pos1;
	pos3=4+pos1;
#endif
	if(b!=t[1])
		return 0;
	b += SQRT_3;b = (b << 15) | (b >> 17);
	
	a += (b ^ c ^ d) + buffer[pos1] + SQRT_3; a = (a << 3 ) | (a >> 29);
	if(a!=t[0])
		return 0;
	
	d += (a ^ b ^ c) + buffer[pos2] + SQRT_3; d = (d << 9 ) | (d >> 23);
	if(d!=t[3])
		return 0;
	
	c += (d ^ a ^ b) + buffer[pos3] + SQRT_3; c = (c << 11) | (c >> 21);	
	return c==t[2];
}

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ