Date: Sun, 16 Aug 2015 16:10:26 +0800
From: Lei Zhang <>
Subject: Re: Formats using non-SIMD SHA2 implementations

On Aug 14, 2015, at 5:31 PM, magnum <> wrote:
> On 2015-08-14 04:35, Lei Zhang wrote:
>> I traced the execution of 7z's encryption: the size the hashed message could be really big, far beyond even 4 SHA2 input blocks. I think it's not possible to do the hashing with a single call to SIMDSHA256body().
>> Is there a way to repeatedly invoking SIMDSHA256body() just like SHA256_Update()?
> Sure, you just have to do the job yourself. Last (or single) block is max 55 bytes of input, all other can be 64 bytes.
> Say you need to do 189 bytes. You take the first 64 bytes (no 0x80, no length) and call SIMDSHA256body(). Then next 64 bytes and call it again. Now you have 61 bytes left. You put them in the buffer, add a 0x80 and zero the rest. And call SIMDSHA256body() again. Finally, in this case, you take a block of all zeros, just add the length (189*3) and make a final call.
> The problem is when you have different length input in one vector. Say one of them required 4 limbs, and another just 3 and the rest only one. This is doable (we do in eg. SAP F/G) but tedious - and reduces benefit of SIMD much like diverging threads in OpenCL does. So we usually don't do SIMD with such formats.

I might need a little help here... I wrote a small snippet of code to experiment with SIMDSHA256body(), but somehow I can't get my anticipated output from it. Here's the code:

#include <openssl/sha.h>
#include "simd-intrinsics.h"

#define BIN_SIZE 32
#define BUF_SIZE 64
#define MSG_SIZE 8
#define HASH_IDX ((index&(SIMD_COEF_32 - 1)) + index/SIMD_COEF_32*BUF_SIZE/4*SIMD_COEF_32)

int main() {
    /* use OpenSSL */
    static uint32_t msg[MSG_SIZE/4] = {-1,-1}, // test input

    SHA256((unsigned char*)msg, sizeof(msg), (unsigned char*)out);
    /* use SIMD */
    static uint32_t vec_in [BUF_SIZE/4*SIMD_COEF_32],
    memset(vec_in, 0, sizeof(vec_in));

    int i, index;
    for (index = 0; index < SIMD_COEF_32; ++index) {
        for (i = 0; i < MSG_SIZE/4; ++i)
            vec_in[HASH_IDX + i*SIMD_COEF_32] = __builtin_bswap32(msg[i]);
        // padding
        vec_in[HASH_IDX + i*SIMD_COEF_32] = (0x80 << 24);
        vec_in[HASH_IDX + 15*SIMD_COEF_32] = MSG_SIZE*8;

    SIMDSHA256body(vec_in, vec_out, NULL, SSEi_MIXED_IN);

    // compare results
    printf("0x%x == 0x%x ?\n", out[0], vec_out[0]);

I tweaked it for a while but couldn't find out what's wrong. I think the copying of message and the padding are fine. Maybe I used the wrong flag for SIMDSHA256body()?


