Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 3 Sep 2015 07:56:53 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: SHA-1 H()

On Wed, Sep 02, 2015 at 09:31:34PM +0200, magnum wrote:
> On 2015-09-02 17:52, Solar Designer wrote:
> >On Wed, Sep 02, 2015 at 06:20:25PM +0300, Solar Designer wrote:
> >>SHA-1's H() aka F3() is the same as SHA-2's Maj()
> >
> >And it turns out that while we appear to be optimally using bitselect()
> >or vcmov() for Maj(), the fallback expressions that we use vary across
> >source files and are not always optimal:
> 
> Perhaps Ch() too:
> 
> #define Ch(x, y, z) (z ^ (x & (y ^ z)))
> #define Ch(x, y, z) ((x & y) ^ ( (~x) & z))
> 
> This is 3 vs. 4 ops, right?

On archs without AND-NOT, yes.  So it's a good find, and I'm happy you
patched these.

However, on archs with AND-NOT either is 3 ops, and the one with AND-NOT
has some parallelism.  This brings us to:

Maybe we need to adjust our emulation of vcmov() to use the form with
AND-NOT when we know that AND-NOT is available - and since we're dealing
with intrinsics, we do know and it usually is.  Not only for Ch(), but
in general.

This will need to be benchmarked.  Along with higher parallelism comes
higher register pressure.  It is possible that optimal interleaving
factors will become lower than they are now.

Maybe both forms of emulation need to be kept in pseudo_intrinsics.h
with a way for us to choose one or the other.  It might happen that the
optimal choice will vary by arch, CPU, compiler, format.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ