Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 3 Jul 2015 20:13:55 +0800
From: Lei Zhang <zhanglei.april@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: extend SIMD intrinsics


> On Jul 3, 2015, at 5:10 PM, magnum <john.magnum@...hmail.com> wrote:
> 
> I guess you mean eg. the "common" stuff in the end of the file. We could copy those macros to each section instead, which would be much easier to follow - but it would sometimes mean verbatim copies of macros, risking enhancements to be made to just one place of potentially several.

Actually I mean we should make it clear which intrinsics are support by all archs, e.g. a list like:
------
vadd
vand
vload
vstore
vsll
vsrl
...
------

Those primitive intrinsics should be available in any decent SIMD arch, and can be used portably. 

The current situation is that, without such a list, we may risk losing portability when writing intrinsics. Imagine that, when I implement a format on a AVX2 laptop, I just look at the AVX2 section in pseudo_intrinsics.h and find vloadu and vshuffle_epi8 to be in the large list of supported intrinsics, so I add them to my code. Unfortunately this code won't work when I port it to MIC, because those two intrinsics are not in MIC's list. 

However, if given a minimal set of supported intrinsics, I'll notice that vloadu and vshuffle_epi8 aren't in that set, and I may choose some portable alternatives or use them with caution (e.g. wrap the code with #ifdef).

I think the easiest way to tackle this issue is to, for each arch, split the list of supported intrinsics into two parts: one part contains primitive intrinsics, and the other contains more "advanced" intrinsics. The set of primitive intrinsics for are the same for each arch and are always portable. The "advanced" intrinsics can be used for more optimized code, and need to be wrapped with #ifdefs in user code.

>> Second, the interfaces exposed by pseudo_intrinsics.h are width
>> agnostic, but not platform agnostic enough. Currently they are too
>> tightly bound to Intel's intrinsics set. Some of them are
>> inconvenient or inefficient to implement with Power/ARM's native
>> intrinsics. OTOH, this may not be a big issue, if non-x86 platforms
>> are not our major concerns.
> 
> Can you see a way to make it better while still using pseudo-intrinsics? What is the difference, is it things like three-operand instructions?

I think it's no problem using pseudo-intrinsics, but some interfaces needs redesigning. For example, x86's __m512i is element-type-agnositc, but AltiVec is not. 'vector int' and 'vector long' are different types in AltiVec, and cannot be used interchangeably (unless explicit casting). Currently we use type (__m512i) pervasively in our code, and don't distinguish element types when declaring variables. This already caused me headaches when incorporating AltiVec intrinsics. Maybe we can define two different types, e.g. vtype32 and vtype64.


> The current sse-intrinsics.c is just using the pseudo-intrinsics and is almost free from alternatives. If we need to, we can have alternative implementations depending on what (pseudo) intrinsics are available, eg.

That's good. OTOH, I just found that some formats still use raw x86 intrinsics and some use too advanced intrinsics to be found on non-x86 archs. Those all needs handling in order to support non-x86 intrinsics.


Lei

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.