Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 2 Apr 2015 23:47:14 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: New SIMD generations, code layout

Hi magnum,

> On Apr 1, 2015, at 4:13 PM, magnum <john.magnum@...hmail.com> wrote:
> 
> What you need to do:
> 1. Fix it so rawSHA512_ng builds at all (eg. change the top "#if
> __SSE2__" to something like "#if __SSE2__ || __MIC__" for a starter).
> 2. Fix whatever more is needed to make it build at all. For example,
> while the SWAP_ENDIAN macro is blindly added for AVS512, it's untested.
> And the GATHER macro doesn't even have a section for AVX512 yet, but it
> needs one. By the way, we should probably move those two macros to the
> pseudo-intrinsics.h file instead. Perhaps as vswap() and vgather().
> 3. Fix whatever more is needed to make it run correctly.
> 4. See if there are things that can be implemented better (faster).

I fixed the MIC intrinsics used in rawSHA256_ng and rawSHA512_ng. Now they can build and pass the self-tests. 

rawSHA1_ng seems a bit troublesome because of the use of hardcoded lookup table. The table for AVX2 looks cumbersome enough. I can't imagine how the table for MIC looks like if defined in the same way. I tried to use bit shifts to make up the table, making it look like this:

#define X ((((uint128)0xFFFFFFFFFFFFFFFF)<<64) + 0xFFFFFFFFFFFFFFFF)
    static const __aligned_simd uint128_t kUsedBytesTable[][4] = {
        {X<<  0, X<<  0, X<<  0, X<<  0}, {X<<  8, X<<  0, X<<  0, X<<  0}, {X<< 16, X<<  0, X<<  0, X<<  0}, ... }

This looks more compact but still cumbersome. I don't know if there's a better way.

BTW, I have a question on how the lookup table is constructed. In kUsedBytesTable, from my observation, each subarray corresponds to a SIMD vector and those vectors are consecutively shifted left by one byte in order. But in the lower middle of the table, I find a "jump" that breaks my observation:

// for SSE
static const __aligned_simd uint32_t kUsedBytesTable[][4] = {
	...
        { 0x00000000, 0x00000000, 0xFF000000, 0xFFFFFFFF },
        { 0x00000000, 0x00000000, 0x00000000, 0xFFFFFF00 },
	...
    };

The lower subarray is supposed to be shifted left by one bytes from the upper subarray, but actually it's shifted left by two bytes. I don't know if this is a mistyping or something intentionally done. Could you give me some explanation?


Thanks,
Lei



Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ