Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 01 Apr 2015 10:13:30 +0200
From: magnum <>
Subject: Re: New SIMD generations, code layout

On 2015-04-01 06:15, Lei Zhang wrote:
>> On Mar 31, 2015, at 12:09 AM, magnum <>
>> wrote:
>> I just made a rough experimental version of raw-sha1-ng with AVX2 
>> support (not committed). It's definitely worth it. But to the
>> point, a question popped up. The code is now loaded with things
>> like this:
> I just tried to add MIC support to rawSHA256_ng, but the file seems a
> bit hardcoded for SSE and I have to write "#ifdef __MIC__ {...}"
> (like the code above) everywhere. It almost feels like I'm rewriting
> the whole file, copying the original code and then replacing every
> occurrence of "_mm256" with "_mm512". I don't feel this is the right
> way to go. I guess other files that use SSE intrinsics are more or
> less the same case. I'm curious how magnum handled this when adding
> AVX2 support. Is there a better way without using pseudo-intrinsics?

An odd thing with MIC compared to AVX2 and below is the MIC doesn't have
any "lower" support. Eg. you always know that an XOP capable system also
supports (and defines) AVX, AVX also supports SSE4.1, SSE4.1 also
supports SSSE3 and all of them supports SSE2. But MIC *only* supports
the _mm512* intrinsics.

But I'm fairly confident the pseudo intrinsics will make this a lot
easier to handle. Many of the #ifdefs in the format file will go away.

> Maybe we can start implementing the pseudo-intrinsics now. Those used
> in DES_bs_b.c make a good reference, but not comprehensive enough.
> What's your opinion? I may start doing this if it's appropriate.

Yes, I ended up using my own brew, as in vload() for __mm*load*() and
vadd_epi32 for __mm*add_epi32(). It also emulates some stuff, like the
vcmov(y, z, x) (supported on XOP without emulation).

I must confess having done much of the fundament already. I already
include a section for AVX512/MIC but it's not tested and will need
tweaking for sure.

Attached is a patch with my experimental Raw-SHA512-ng and the pseudo
header (which includes many intrinsics needed for other formats too,
including sse-intrinsics.c). It's just a rough start, mostly untested
and may currently use some intrinsics that doesn't even exist so needs
emulation (or some change in the caller). The version in this patch does
work for SHA512 with anything from SSE2 to AVX2 (it passes self-test and
the Test Suite[1]) and I believe it's close to working on the MIC.

What you need to do:
1. Fix it so rawSHA512_ng builds at all (eg. change the top "#if
__SSE2__" to something like "#if __SSE2__ || __MIC__" for a starter).
2. Fix whatever more is needed to make it build at all. For example,
while the SWAP_ENDIAN macro is blindly added for AVS512, it's untested.
And the GATHER macro doesn't even have a section for AVX512 yet, but it
needs one. By the way, we should probably move those two macros to the
pseudo-intrinsics.h file instead. Perhaps as vswap() and vgather().
3. Fix whatever more is needed to make it run correctly.
4. See if there are things that can be implemented better (faster).

Just concentrate on the MIC, I will experiment with AVX2. We can
coordinate our changes later (I'm offline for like 10 hours now). When
SHA512 seems fine, try SHA256-ng or SHA1-ng. Once these three files seem
fine with AVX2 and MIC, we can move on to sse-intrinics.c - and this
will give AVX2/AVX512 support to a huge number of formats! That last
change will likely result in LOTS of little regression problems though,
with formats that hardcode vector width and so on.



View attachment "0001-Add-a-pseudo-intrinsics-header-and-use-it-for-raw-sh.patch" of type "text/plain" (31262 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.