Date: Thu, 3 Sep 2015 07:56:53 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: SHA-1 H() On Wed, Sep 02, 2015 at 09:31:34PM +0200, magnum wrote: > On 2015-09-02 17:52, Solar Designer wrote: > >On Wed, Sep 02, 2015 at 06:20:25PM +0300, Solar Designer wrote: > >>SHA-1's H() aka F3() is the same as SHA-2's Maj() > > > >And it turns out that while we appear to be optimally using bitselect() > >or vcmov() for Maj(), the fallback expressions that we use vary across > >source files and are not always optimal: > > Perhaps Ch() too: > > #define Ch(x, y, z) (z ^ (x & (y ^ z))) > #define Ch(x, y, z) ((x & y) ^ ( (~x) & z)) > > This is 3 vs. 4 ops, right? On archs without AND-NOT, yes. So it's a good find, and I'm happy you patched these. However, on archs with AND-NOT either is 3 ops, and the one with AND-NOT has some parallelism. This brings us to: Maybe we need to adjust our emulation of vcmov() to use the form with AND-NOT when we know that AND-NOT is available - and since we're dealing with intrinsics, we do know and it usually is. Not only for Ch(), but in general. This will need to be benchmarked. Along with higher parallelism comes higher register pressure. It is possible that optimal interleaving factors will become lower than they are now. Maybe both forms of emulation need to be kept in pseudo_intrinsics.h with a way for us to choose one or the other. It might happen that the optimal choice will vary by arch, CPU, compiler, format. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.