Date: Tue, 8 Sep 2015 21:45:13 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Large stack alignment On 2015-09-08 11:24, Solar Designer wrote: > On Tue, Sep 08, 2015 at 01:37:21AM +0200, magnum wrote: >> On 2015-09-06 19:55, Solar Designer wrote: >>> (...) although ideally we'd have the compiler align the >>> stack and then we wouldn't spend any extra registers on this. >>> >>> More importantly, none of this addresses the "redundant or insufficient" >>> aspect. We need to figure it out. >> >> Our AC is already capable of testing whether arbitrary options like >> "-mpreferred-stack-boundary=5" is accepted by the compiler or not. Does >> this help? I'm not quite sure why it's called "preferred"? > > We should check whether -mavx2 possibly already implies > -mpreferred-stack-boundary=5. It might. > > This should help (and possibly already does) prevent our own code from > misaligning the stack, but it won't help deal with possible stack > misalignment by libraries. The library most relevant to us is libgomp. > Hopefully, AVX2-capable gcc has its libgomp built with > -mpreferred-stack-boundary=5, but I don't know whether this is the case > or not. Also, it'd rely on the stack being aligned when main() starts. After exhausting my google-fu I did some empirical testing, using gcc 4.8.2-19ubuntu1 and gcc 4.8.4 on a Macbook. Attached are two simple files for reproducing and working around the problem. The first, test.c, reliably triggers the problem: Stack variables using __attribute__ ((aligned())) declared inside an omp parallel for loop is *not* aligned (if we actually build with OpenMP). This regardless of extra compiler options I tried. However, if buffer is declared as __m256i instead, the problem goes away (no special options needed, just -mavx2). The second, test2.c, seems to show an alternative workaround: If I make the body of the loop its own function, problem vanishes with no special compiler options. This even if it's inlined (well I did not verify it really ended up inlined - it probably did not since I guess OMP probably benefits from having it as a function). All this is exactly what we've seen so far IRL in Jumbo. All problems we've had was aligned declarations local to a parallel for. Also, I do not see *any* problem using gcc-4.9.2 (only OSX tested as of yet). I'll dig into their list of changes (but I already googled this so much it would be strange if I missed it). I'll do some more tests on various systems and compilers. Jim, please test under Cygwin and, if possible, under MinGW64. Several google hits indicate problem is (or was) worse under MinGW64. magnum View attachment "test.c" of type "text/plain" (1804 bytes) View attachment "test2.c" of type "text/plain" (1354 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.