john-dev - Re: SHA-1 H()

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <3ed25904298d05c7294801fd1325a5ba@smtp.hushmail.com>
Date: Thu, 03 Sep 2015 21:29:37 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: SHA-1 H()

On 2015-09-03 20:40, Solar Designer wrote:
> On Thu, Sep 03, 2015 at 11:52:47AM +0200, magnum wrote:
>> On 2015-09-03 06:56, Solar Designer wrote:
>>> On Wed, Sep 02, 2015 at 09:31:34PM +0200, magnum wrote:
>>>> #define Ch(x, y, z) (z ^ (x & (y ^ z)))
>>>> #define Ch(x, y, z) ((x & y) ^ ( (~x) & z))
>>>>
>>>> This is 3 vs. 4 ops, right?
>>>
>>> On archs without AND-NOT, yes.  So it's a good find, and I'm happy you
>>> patched these.

>> Apparently GCN has ANDN and NAND.
>
> I need to take a fresh look at the arch manual, but in the generated
> code I only see scalar ANDN, and never vector ANDN (nor NAND).  They
> defined scalar ANDN presumably because it's so useful for exec masks.
>
> I see you've committed this:
>
> +#if cpu(DEVICE_INFO) || amd_gcn(DEVICE_INFO)
> +#define HAVE_ANDNOT 1
> +#endif
>
> but I think the check for amd_gcn(DEVICE_INFO) is wrong.

We currently never run vectorized on GCN anyway, unless forced by user - 
if format supports it at all. But perhaps it should be 
(amd_gcn(DEVICE_INFO) && (V_WIDTH < 2)) then?

> And why this change? -
>
> -#if !gpu_nvidia(DEVICE_INFO) || nvidia_sm_5x(DEVICE_INFO)
> +#if !gpu_nvidia(DEVICE_INFO)
>   #define USE_BITSELECT 1
>   #elif gpu_nvidia(DEVICE_INFO)
>   #define OLD_NVIDIA 1
>   #endif

I saw definite speedup for PBKDF2 and RAR iirc, and perhaps md5crypt. 
But later I saw contradicting figures for other formats so I'm not sure 
about this and things are in a state of flux. It might be that we should 
revert to initially setting it (for Maxwell) in opencl_misc.h, and later 
conditionally undefine it in certain formats.

Is bitselect() expected to always generate a LOP3.LUT? Even if it is, I 
figure the optimizer just might be able to do better when given 
bitselect-free code.

Besides all this, I see I introduced a bug: Now OLD_NVIDIA is defined 
for Maxwell and that was not the intention. I'll fix that right away.

>> BTW early tests indicate that 5916a57 made SHA-512 very slightly worse
>> (but almost hidden by normal variations).
>
> On what hardware?

AVX and AVX2. My overall feeling is SHA256 got a slight boost while 
SHA512 did not and sometimes the latter got a very slight regression. 
But I haven't really gone systematic yet. All my tests are very 
inconclusive as of yet, the fluctuations are larger than the 
boosts/regressions.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.