Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 18 Aug 2015 08:08:10 +0800
From: Lei Zhang <>
Subject: Re: Formats using non-SIMD SHA2 implementations

On Aug 18, 2015, at 5:22 AM, JimF <> wrote:
> On Mon, 17 Aug 2015 15:57:26 -0500, magnum <> wrote:
>> On 2015-08-17 09:40, Lei Zhang wrote:
>>> On Aug 17, 2015, at 2:26 PM, magnum <> wrote:
>>>> On 2015-08-17 05:07, Lei Zhang wrote:
>>>>> I finally got 7z to work correctly with SIMD :)
>>>> Are you sorting lengths, like Jim hinted? Or are you handling diverging lengths like in SAP F/G?
>>> No, I haven't done that yet. I may give that a try too. I hope it's not too tricky to implement. The code already looks ugly enough...
>> Are you saying you do neither? That can't work. If it seems to work, it's only because all test vectors are same length. The same applies to RAR3.
> what we do here, is to put the data into our buffers (we are using flat buffers), and the function that does this tells us that for a specific string, it takes X number of sha256 calls.  Then we loop until ALL the values have been completed (i.e. when the max of loops[x] == cnt).  Ignore some of the strangeness, such as cp += 32*2;  There is some weird looking code, because this file is auto-generated, and the same template for this function is used for ALL simd formats.

I think I'm doing it more or less the same way as yours, except for the relatively large buffer used in my code.

> This could also be done by loading the buffers within the "while (bMore) {}" loop, and SHOULD be done there with only 1 buffer, if the number of multiple crypt calls is large or unknnown. In the dynamic format, I do have 4 crypt limb (or 2 limb for 64 bit SIMD), and use them 'intact'.  But I do that because there is a lot of reading and writing in these buffers, and no real way to know how or what will be written to them (it is dynamic btw).

The problem in 7z is that the message to be hashed needs to be constructed first.

The original scalar code (simplified):

for (round = 0; round < rounds; ++round) {
    SHA256_Update(&ctx, password, len);
    SHA256_Update(&ctx, &round, sizeof(round));

The 'rounds' is a really big number, and a small difference in 'len' might results in very different length of the whole message. As a result, I cannot pre-tell how many limbs are needed.

One way is to construct the whole message before feeding it on-fly to a small vector buffer; or use a large vector buffer to construct the message in-place. Either way I cannot avoid using a large buffer. The optimal way might be to construct the message on-fly while feeding it to a small vector buffer. This is theoretically doable, but I found it too tricky to implement... So I chose to use a large vector buffer (~30MB for a non-OpenMP build), which works ok so far.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.