Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 12 Jul 2015 19:20:18 -0400
From: Alain Espinosa <alainesp@...ta.cu>
To: john-dev@...ts.openwall.com
Subject: Re: extend SIMD intrinsics



-------- Original message --------
From: Solar Designer <solar@...nwall.com> 
Date:07/12/2015 12:52 PM (GMT-05:00) 
To: john-dev@...ts.openwall.com 
Cc: 
Subject: Re: [john-dev] extend SIMD intrinsics 

...Right.  However, what if you move test_array to global scope, or declare
it "static" in the function?  I am wondering if the compiler possibly
doesn't want to rely on the stack being 16-byte aligned (as it normally
is per x86_64 ABI).

I just make some retests. I was generalizing from AVX2 to SSE2, and that was wrong. Take this code:

#define SIMD_WORD __m128i
void test()
{
   SIMD_WORD test_array[16];
   ...
  SIMD_WORD test_var=test_array[6];
   ...
}

-This generate movdqa instructions whenever we put test_array in global or stack scope.

-When we change SIMD_WORD to __m256i the code generate vmovdqu whenever we put test_array in global or stack scope. If we change the assignment to use _mm256_load_si256 it generates vmovdqa instruction.

-When we change SIMD_WORD to __m256 the code generate vmovups whenever we put test_array in global or stack scope.

So the unaligned access is with AVX/AVX2 intrinsics.

Regards, 
Alain
Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.