Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 31 Mar 2015 08:51:11 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: New SIMD generations, code layout

On 2015-03-31 03:21, Lei Zhang wrote:
> 
>> On Mar 31, 2015, at 12:09 AM, magnum <john.magnum@...hmail.com> wrote:
>>
>> #if __AVX2__
>> typedef __m256i           v_uint
>> #define jtr_setzero_si    _mm256_setzero_si256
>> #define jtr_loadu_si      _mm256_loadu_si256
>> #define jtr_movemask_epi8 _mm256_movemask_epi8
>> #define jtr_cmpeq_epi8    _mm256_cmpeq_epi8
> 
> I think this makes sense. But isn't this pattern already in good use
> in JtR? I see something like this in DES_bs_b.c:
> 
>  #if defined(__AVX__)
>  typedef __m256 vtype;
>  #define vst(dst, ofs, src)   _mm256_store_ps((float *)((DES_bs_vector *)&(dst) + (ofs)), (src))
>  #define vxorf(a, b)  _mm256_xor_ps((a), (b))
> 
> This seems to me the pseudo-intrinsics you want. So I guess the
> problem is that, in JtR, some files use pseudo-intrinsics while some
> not? If this is the case, IMHO, it would be better to use a unifying
> definition of all the intrinsics, just as you suggested. Otherwise
> we'll have to put those CPU detection code in every file that uses
> CPU-specific intrinsics.

Oh, right. DES_bs_b.c is a core file, I was talking mostly about
sse-intrinsics.c which contain 99% of Jumbo's intrinsics. But I must
confess I totally forgot about core - and this indicates my idea was
neither original nor silly.

So we should probably just adopt Solar's names for the pseudos.

On a side note I really like how md5slice.c
(http://www.openwall.com/lists/john-dev/2015/03/11/10) is written - not
a single intrinsic, can adopt to eg. AVX-1024 once the compiler supports
it, not just for x86 and so on. But it really depends on the compiler.
How portable is it? Would it result in effective code for non-bitslice
too (why not)? Are there things you can hardly do without intrinsics,
like transposing between scalar and vector? Maybe parts of our code
could be written that way instead?

magnum


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ