Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 29 May 2015 22:52:41 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Interleaving of intrinsics

Solar,

Here's a GitHub issue where we discuss interleaving and present some 
benchmarks:
https://github.com/magnumripper/JohnTheRipper/issues/1217

I think you will have some educated thoughts about this; Here's part of 
our current SHA-1:


#define SHA1_PARA_DO(x)	for((x)=0;(x)<SIMD_PARA_SHA1;(x)++)

#define SHA1_ROUND2(a,b,c,d,e,F,t)                      \
     SHA1_PARA_DO(i) tmp3[i] = tmpR[i*16+(t&0xF)];       \
     SHA1_EXPAND2(t+16)                                  \
     F(b,c,d)                                            \
     SHA1_PARA_DO(i) e[i] = vadd_epi32( e[i], tmp[i] );  \
     SHA1_PARA_DO(i) tmp[i] = vroti_epi32(a[i], 5);      \
     SHA1_PARA_DO(i) e[i] = vadd_epi32( e[i], tmp[i] );  \
     SHA1_PARA_DO(i) e[i] = vadd_epi32( e[i], cst );     \
     SHA1_PARA_DO(i) e[i] = vadd_epi32( e[i], tmp3[i] ); \
     SHA1_PARA_DO(i) b[i] = vroti_epi32(b[i], 30);


And here's a similar part of SHA256:

#define SHA256_STEP0(a,b,c,d,e,f,g,h,x,K)                    \
{                                                            \
     SHA256_PARA_DO(i)                                        \
     {                                                        \
         w = _w[i].w;                                         \
         tmp1[i] = vadd_epi32(h[i],    S1(e[i]));             \
         tmp1[i] = vadd_epi32(tmp1[i], Ch(e[i],f[i],g[i]));   \
         tmp1[i] = vadd_epi32(tmp1[i], vset1_epi32(K));       \
         tmp1[i] = vadd_epi32(tmp1[i], w[x]);                 \
         tmp2[i] = vadd_epi32(S0(a[i]),Maj(a[i],b[i],c[i]));  \
         d[i]    = vadd_epi32(tmp1[i], d[i]);                 \
         h[i]    = vadd_epi32(tmp1[i], tmp2[i]);              \
     }                                                        \
}

This file is -O3 (from a pragma) so I guess both cases will be unrolled 
but there is obviously a big difference after just unrolling. Assuming a 
perfect optimizer it wouldn't matter but assuming a non-perfect one, is 
the former better? I'm guessing SHA-1 was written that way for a reason?

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.