```Date: Fri, 29 May 2015 10:56:09 +0300
From: Solar Designer <solar@...nwall.com>
To: Alain Espinosa <alainesp@...ta.cu>
Cc: john-dev@...ts.openwall.com
Subject: Re: bitslice SHA-256

On Fri, May 29, 2015 at 01:22:10AM -0400, Alain Espinosa wrote:
> ...I briefly experimented with merged ADDs in this md5slice.c revision
>
> I will take a look.
>
> ...add32c() is a 3-input ADD where one of the inputs is a constant
>
> I check this code searching how to reduce sum instructions count. If I understand it correctly you use more than 5 for one add (more than 10 for 2, if I recall correctly you use 11).

My add32() appears to use 5 (not counting the loads and the store):

a = *x++;
b = *y++;
*z++ = (p = a ^ b) ^ c;
c = (p & c) | (a & b);

But you're right - my add32c()'s code path when the constant has a 1 bit
uses 11 (with XNOR) or 12 (without).  This feels wrong, and there got to
be a way to optimize this to 10 or less within the same instruction set.
Its code path for when the current constant bit is 0 has only 7
operations, though - so this demonstrates how the addition of a constant
can be cheaper than of a variable:

a = *x++;
b = *y++;
if (c & 1) {
*z++ = ~(a ^ b) ^ c1 ^ c2;
c2 = (a & b & (p = c1 | c2)) | (c1 & c2 & (q = a | b));
c1 = p | q;
} else {
*z++ = (q = (p = a ^ b) ^ c1) ^ c2;
c1 = (p & c1) | (a & b);
c2 &= q;
}

Alexander
```

