Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 6 Apr 2015 21:38:03 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: New SIMD generations, code layout


> On Apr 6, 2015, at 8:33 PM, magnum <john.magnum@...hmail.com> wrote:
> 
> It looks good. What failure do you get? Your version fails even with
> AVX2, with "FAILED (cmp_all(5))". That message means keys 0..4 were set,
> crypt_all(5) was called and them cmp_all(5) which did not indicate
> anything was cracked. So everything worked correctly up to 4, but 5th
> failed.

I find the problem to be with this code snippet in sha1_fmt_cmp_all:
------------------------------------------------------------
        for (i = 0; i < count; i += 64) {
                int32_t R = 0;
#if __MIC__ || __AVX512__
        R |= vtesteq_epi32(B, vload(&MD[i +  0]));
        R |= vtesteq_epi32(B, vload(&MD[i + 16]));
        R |= vtesteq_epi32(B, vload(&MD[i + 32]));
        R |= vtesteq_epi32(B, vload(&MD[i + 48]));
#elif __AVX2__
        R |= vtesteq_epi32(B, vload(&MD[i +  0]));
        R |= vtesteq_epi32(B, vload(&MD[i +  8]));
        R |= vtesteq_epi32(B, vload(&MD[i + 16]));
        R |= vtesteq_epi32(B, vload(&MD[i + 24]));
        R |= vtesteq_epi32(B, vload(&MD[i + 32]));
        R |= vtesteq_epi32(B, vload(&MD[i + 40]));
        R |= vtesteq_epi32(B, vload(&MD[i + 48]));
        R |= vtesteq_epi32(B, vload(&MD[i + 56]));
#else
        R |= vtesteq_epi32(B, vload(&MD[i +  0]));
        R |= vtesteq_epi32(B, vload(&MD[i +  4]));
        R |= vtesteq_epi32(B, vload(&MD[i +  8]));
        R |= vtesteq_epi32(B, vload(&MD[i + 12]));
        R |= vtesteq_epi32(B, vload(&MD[i + 16]));
        R |= vtesteq_epi32(B, vload(&MD[i + 20]));
        R |= vtesteq_epi32(B, vload(&MD[i + 24]));
        R |= vtesteq_epi32(B, vload(&MD[i + 28]));
        R |= vtesteq_epi32(B, vload(&MD[i + 32]));
        R |= vtesteq_epi32(B, vload(&MD[i + 36]));
        R |= vtesteq_epi32(B, vload(&MD[i + 40]));
        R |= vtesteq_epi32(B, vload(&MD[i + 44]));
        R |= vtesteq_epi32(B, vload(&MD[i + 48]));
        R |= vtesteq_epi32(B, vload(&MD[i + 52]));
        R |= vtesteq_epi32(B, vload(&MD[i + 56]));
        R |= vtesteq_epi32(B, vload(&MD[i + 60]));
#endif
                M |= R;
        }
------------------------------------------------------------

In the original code, the stride (between two vtesteq_epi32s) is fixed to 4. I think I should adjust the stride according to the SIMD width, so I modify the code as how it looks now. And as you mentioned, the new code fails even on AVX2. I just tried to revert it back to use the fixed stride of 4, and then it passed the self-test on AVX2, which is strange. I don't know why the stride isn't adjustable. And I can't try that fixed stride on MIC, because it won't guarantee the 64-byte alignment required by MIC.

Any thoughts?


Lei

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.