Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 03 Sep 2015 11:52:47 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: SHA-1 H()

On 2015-09-03 06:56, Solar Designer wrote:
> On Wed, Sep 02, 2015 at 09:31:34PM +0200, magnum wrote:
>> On 2015-09-02 17:52, Solar Designer wrote:
>>> On Wed, Sep 02, 2015 at 06:20:25PM +0300, Solar Designer wrote:
>>>> SHA-1's H() aka F3() is the same as SHA-2's Maj()
>>>
>>> And it turns out that while we appear to be optimally using bitselect()
>>> or vcmov() for Maj(), the fallback expressions that we use vary across
>>> source files and are not always optimal:
>>
>> Perhaps Ch() too:
>>
>> #define Ch(x, y, z) (z ^ (x & (y ^ z)))
>> #define Ch(x, y, z) ((x & y) ^ ( (~x) & z))
>>
>> This is 3 vs. 4 ops, right?
>
> On archs without AND-NOT, yes.  So it's a good find, and I'm happy you
> patched these.
>
> However, on archs with AND-NOT either is 3 ops, and the one with AND-NOT
> has some parallelism.

Maybe the and-not one is better on some GPU then? I need to test. 
Apparently GCN has ANDN and NAND. Not sure about nvidia. I really hope 
we don't need a '(~x) & z' and a 'z & (~x)' version too?  Optimizers are 
usully fascinating but sometimes very disappointing.

> Maybe both forms of emulation need to be kept in pseudo_intrinsics.h
> with a way for us to choose one or the other.  It might happen that the
> optimal choice will vary by arch, CPU, compiler, format.

But if it varies by format, we need to decide outside pseudo_intrinsics.h.

BTW early tests indicate that 5916a57 made SHA-512 very slightly worse 
(but almost hidden by normal variations).

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ