Date: Fri, 14 Aug 2015 11:31:00 +0200
From: magnum <>
Subject: Re: Formats using non-SIMD SHA2 implementations

On 2015-08-14 04:35, Lei Zhang wrote:
>> On Aug 13, 2015, at 5:42 AM, magnum <> wrote:
>> On 2015-08-12 15:26, Lei Zhang wrote:
>>> Now I just finished episerver. I took a close look at the rest of the formats in that list, and found a few 'technical' issues.
>>> - For 7z, keepass and pdf, there's AES encryption involved at some step of hashing (and also RC4 in pdf). But so far we don't have a SIMD implementation of AES (or RC4). I'm not sure how to handle this.
>> Just do in in scalar code (a loop) after running SIMD for producing the keys! For example, the sevenzip_decrypt() function probably needs no change (but if you change it, be sure not to break non-SIMD builds).
> I traced the execution of 7z's encryption: the size the hashed message could be really big, far beyond even 4 SHA2 input blocks. I think it's not possible to do the hashing with a single call to SIMDSHA256body().
> Is there a way to repeatedly invoking SIMDSHA256body() just like SHA256_Update()?

Sure, you just have to do the job yourself. Last (or single) block is 
max 55 bytes of input, all other can be 64 bytes.

Say you need to do 189 bytes. You take the first 64 bytes (no 0x80, no 
length) and call SIMDSHA256body(). Then next 64 bytes and call it again. 
Now you have 61 bytes left. You put them in the buffer, add a 0x80 and 
zero the rest. And call SIMDSHA256body() again. Finally, in this case, 
you take a block of all zeros, just add the length (189*3) and make a 
final call.

The problem is when you have different length input in one vector. Say 
one of them required 4 limbs, and another just 3 and the rest only one. 
This is doable (we do in eg. SAP F/G) but tedious - and reduces benefit 
of SIMD much like diverging threads in OpenCL does. So we usually don't 
do SIMD with such formats.


