Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 24 Oct 2011 12:28:23 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Benchmarks vs GCC version

On Mon, Oct 10, 2011 at 07:21:24PM -0400, Erik Winkler wrote:
> On Oct 10, 2011, at 6:19 PM, Solar Designer wrote:
> 
> > On Mon, Oct 10, 2011 at 04:48:08PM -0400, Erik Winkler wrote:
> >> Why is the gcc 4.2.1 code significantly faster than the 4.6.1 code?  Is it because of the SSE2 intrinsics code used with gcc 4.6.1?
> > 
> > Maybe, but this was not supposed to be the case.  I might do some more
> > testing with gcc 4.6.1 specifically.  However, in current CVS tree, the
> > use of intrinsics has been disabled for another reason anyway.  You can
> > try this trivial patch too:
> 
> Yes, that was the issue.  See new benchmarks below using gcc 4.6.1 after I applied the patch to x86-64.h.

Thanks!  So, I tested with gcc 4.5.0 vs. 4.6.1 now.  I confirm that
there's a 25% slowdown for the SSE2 intrinsics code when going from
4.5.0 to 4.6.1.  To partially cure it, add -fno-unit-at-a-time to
OPT_INLINE.  Apparently, gcc 4.6.x just tries too hard to optimize those
functions with the S-box functions inlined into them, and it fails at
that. :-(  With gcc 4.5.0, adding this option makes little difference.

Of course, switching to hand-written assembly code is another valid
cure, but for OpenMP builds we currently/still use the intrinsics.  So I
think I'll have to add -fno-unit-at-a-time to OPT_INLINE or to proposed
OMPFLAGS (for gcc) in the next release.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ