Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Tue, 26 Jun 2012 05:41:34 +0400
From: Solar Designer <>
Subject: precompiled sse-intrinsics vs. -march=native

magnum, Jim -

It appears that we shouldn't use the precompiled sse-intrinsics files
(icc's *.S) in -march=native builds.  Specifically, when I tried
linux-x86-64-gpu on bull where -march=native implies XOP, I got
reporting that XOP intrinsics were being used, whereas in reality the
build used icc-precompiled SSE2 code.  What's worse, I got segfault for
--format=md5 (read beyond end of heap after MD5_Update() was called with
a huge size from the precompiled intrinsics code, I don't know why), and
failed self-test for raw-md5 and raw-md4 (but working for raw-sha1).

The misreporting issue has an obvious cause.  The segfault and failed
self-tests are a mystery to me: I don't see why the precompiled code
would be incompatible with -march=native on this machine.  The ABI
should stay the same.  Maybe there's a bug lurking around that will also
bite us in other cases.

The attached patch removes the use of precompiled sse-intrinsics from
GPU targets.  The alternative would have been to remove -march=native
from them.  I don't know which is the better choice; neither is perfect.
Since modern systems tend to have at least AVX (and maybe also XOP),
-march=native may be preferable over the precompiled SSE2 code.  Yet
another alternative would be to have more GPU targets for the different
combinations, but that would be confusing.

I've only tested this with linux-x86-64-gpu (the problem went away with
this patch), even though the patch changes 6 targets.


View attachment "john-gpu-no-precompiled-intrinsics.diff" of type "text/plain" (3483 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.