Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 13 Jul 2015 02:35:29 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: extend SIMD intrinsics

On Sun, Jul 12, 2015 at 07:20:18PM -0400, Alain Espinosa wrote:
> I just make some retests. I was generalizing from AVX2 to SSE2, and that was wrong. Take this code:
> 
> #define SIMD_WORD __m128i
> void test()
> {
>    SIMD_WORD test_array[16];
>    ...
>   SIMD_WORD test_var=test_array[6];
>    ...
> }
> 
> -This generate movdqa instructions whenever we put test_array in global or stack scope.
> 
> -When we change SIMD_WORD to __m256i the code generate vmovdqu whenever we put test_array in global or stack scope. If we change the assignment to use _mm256_load_si256 it generates vmovdqa instruction.
> 
> -When we change SIMD_WORD to __m256 the code generate vmovups whenever we put test_array in global or stack scope.
> 
> So the unaligned access is with AVX/AVX2 intrinsics.

Oh, that's a relief.

Indeed, larger than 16-byte stack alignment is in fact not guaranteed by
the ABI, and on current CPUs with AVX the unaligned load instructions
are as fast as the aligned ones.  So the compiler's behavior makes sense.

OK, I guess we can continue without the load intrinsics, then.

Thanks!

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.