Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 8 Sep 2015 21:45:13 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Large stack alignment

On 2015-09-08 11:24, Solar Designer wrote:
> On Tue, Sep 08, 2015 at 01:37:21AM +0200, magnum wrote:
>> On 2015-09-06 19:55, Solar Designer wrote:
>>> (...) although ideally we'd have the compiler align the
>>> stack and then we wouldn't spend any extra registers on this.
>>>
>>> More importantly, none of this addresses the "redundant or insufficient"
>>> aspect.  We need to figure it out.
>>
>> Our AC is already capable of testing whether arbitrary options like
>> "-mpreferred-stack-boundary=5" is accepted by the compiler or not. Does
>> this help? I'm not quite sure why it's called "preferred"?
>
> We should check whether -mavx2 possibly already implies
> -mpreferred-stack-boundary=5.  It might.
>
> This should help (and possibly already does) prevent our own code from
> misaligning the stack, but it won't help deal with possible stack
> misalignment by libraries.  The library most relevant to us is libgomp.
> Hopefully, AVX2-capable gcc has its libgomp built with
> -mpreferred-stack-boundary=5, but I don't know whether this is the case
> or not.  Also, it'd rely on the stack being aligned when main() starts.

After exhausting my google-fu I did some empirical testing, using gcc 
4.8.2-19ubuntu1 and gcc 4.8.4 on a Macbook. Attached are two simple 
files for reproducing and working around the problem.

The first, test.c, reliably triggers the problem: Stack variables using 
__attribute__ ((aligned())) declared inside an omp parallel for loop is 
*not* aligned (if we actually build with OpenMP). This regardless of 
extra compiler options I tried. However, if buffer is declared as 
__m256i instead, the problem goes away (no special options needed, just 
-mavx2).

The second, test2.c, seems to show an alternative workaround: If I make 
the body of the loop its own function, problem vanishes with no special 
compiler options. This even if it's inlined (well I did not verify it 
really ended up inlined - it probably did not since I guess OMP probably 
benefits from having it as a function).

All this is exactly what we've seen so far IRL in Jumbo. All problems 
we've had was aligned declarations local to a parallel for.

Also, I do not see *any* problem using gcc-4.9.2 (only OSX tested as of 
yet). I'll dig into their list of changes (but I already googled this so 
much it would be strange if I missed it).

I'll do some more tests on various systems and compilers. Jim, please 
test under Cygwin and, if possible, under MinGW64. Several google hits 
indicate problem is (or was) worse under MinGW64.

magnum

View attachment "test.c" of type "text/plain" (1804 bytes)

View attachment "test2.c" of type "text/plain" (1354 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.