
Date: Fri, 29 May 2015 10:56:09 +0300 From: Solar Designer <solar@...nwall.com> To: Alain Espinosa <alainesp@...ta.cu> Cc: johndev@...ts.openwall.com Subject: Re: bitslice SHA256 On Fri, May 29, 2015 at 01:22:10AM 0400, Alain Espinosa wrote: > ...I briefly experimented with merged ADDs in this md5slice.c revision > > I will take a look. > > ...add32c() is a 3input ADD where one of the inputs is a constant > > I check this code searching how to reduce sum instructions count. If I understand it correctly you use more than 5 for one add (more than 10 for 2, if I recall correctly you use 11). My add32() appears to use 5 (not counting the loads and the store): a = *x++; b = *y++; *z++ = (p = a ^ b) ^ c; c = (p & c)  (a & b); But you're right  my add32c()'s code path when the constant has a 1 bit uses 11 (with XNOR) or 12 (without). This feels wrong, and there got to be a way to optimize this to 10 or less within the same instruction set. Its code path for when the current constant bit is 0 has only 7 operations, though  so this demonstrates how the addition of a constant can be cheaper than of a variable: a = *x++; b = *y++; if (c & 1) { *z++ = ~(a ^ b) ^ c1 ^ c2; c2 = (a & b & (p = c1  c2))  (c1 & c2 & (q = a  b)); c1 = p  q; } else { *z++ = (q = (p = a ^ b) ^ c1) ^ c2; c1 = (p & c1)  (a & b); c2 &= q; } Alexander
Powered by blists  more mailing lists