john-dev - Re: Formats using non-SIMD SHA2 implementations

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <op.x3i6zq1kzz6j51@1pqhgq1.dtn.com>
Date: Mon, 17 Aug 2015 16:22:28 -0500
From: JimF <jfoug@....net>
To: john-dev@...ts.openwall.com
Subject: Re: Formats using non-SIMD SHA2 implementations

On Mon, 17 Aug 2015 15:57:26 -0500, magnum <john.magnum@...hmail.com>  
wrote:

> On 2015-08-17 09:40, Lei Zhang wrote:
>> On Aug 17, 2015, at 2:26 PM, magnum <john.magnum@...hmail.com> wrote:
>>> On 2015-08-17 05:07, Lei Zhang wrote:
>>>> I finally got 7z to work correctly with SIMD :)
>>
>>> Are you sorting lengths, like Jim hinted? Or are you handling  
>>> diverging lengths like in SAP F/G?
>>
>> No, I haven't done that yet. I may give that a try too. I hope it's not  
>> too tricky to implement. The code already looks ugly enough...
>
> Are you saying you do neither? That can't work. If it seems to work,  
> it's only because all test vectors are same length. The same applies to  
> RAR3.

dynamic is also another format that works like SAP F/G  (if that keeps
iterating until all inputs have been completed).  You can look at any of
the low level SIMD functions in dynamic_big_crypt.c to see how dynamic
is doing it.

here is an example showing SHA256

static void DoSHA256_crypt_sse(void *in, uint32_t ilen[SHA256_LOOPS], void  
*out[SHA256_LOOPS], uint32_t *tot_len, uint32_t tid) {
	JTR_ALIGN(MEM_ALIGN_SIMD) ARCH_WORD_32  
a[(32*SHA256_LOOPS)/sizeof(ARCH_WORD_32)];
	union yy { unsigned char u[32]; ARCH_WORD_32 a[32/sizeof(ARCH_WORD_32)];  
} y;
	uint32_t i, j, loops[SHA256_LOOPS], bMore, cnt;
	unsigned char *cp = (unsigned char*)in;
	for (i = 0; i < SHA256_LOOPS; ++i) {
		loops[i] = Do_FixBufferLen32(cp, ilen[i], 1);
		cp += 64*4;
	}
	cp = (unsigned char*)in; bMore = 1; cnt = 1;
	while (bMore) {
		SIMDSHA256body(cp, a, a, SSEi_FLAT_IN  
|SSEi_4BUF_INPUT_FIRST_BLK|(cnt==1?0:SSEi_RELOAD));
		bMore = 0;
		for (i = 0; i < SHA256_LOOPS; ++i) {
			if (cnt == loops[i]) {
				uint32_t offx =  
((i/SIMD_COEF_32)*32/sizeof(ARCH_WORD_32)*SIMD_COEF_32)+(i&(SIMD_COEF_32-1));
				for (j = 0; j < 32/sizeof(ARCH_WORD_32); ++j) {
					y.a[j] = JOHNSWAP(a[(j*SIMD_COEF_32)+offx]);
				}
				*(tot_len+i) += large_hash_output(y.u, &(((unsigned  
char*)out[i])[*(tot_len+i)]), 32, tid);
			} else if (cnt < loops[i]) bMore = 1;
		}
		cp += 32*2; ++cnt;
	}
}


what we do here, is to put the data into our buffers (we are using flat  
buffers), and the function that does this tells us that for a specific  
string, it takes X number of sha256 calls.  Then we loop until ALL the  
values have been completed (i.e. when the max of loops[x] == cnt).  Ignore  
some of the strangeness, such as cp += 32*2;  There is some weird looking  
code, because this file is auto-generated, and the same template for this  
function is used for ALL simd formats.

This could also be done by loading the buffers within the "while (bMore)  
{}" loop, and SHOULD be done there with only 1 buffer, if the number of  
multiple crypt calls is large or unknnown. In the dynamic format, I do  
have 4 crypt limb (or 2 limb for 64 bit SIMD), and use them 'intact'.  But  
I do that because there is a lot of reading and writing in these buffers,  
and no real way to know how or what will be written to them (it is dynamic  
btw).

BUT as magnum has stated, you MUST handle this, in some way.
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.