john-dev - Re: Benchmarks vs GCC version

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20111024082823.GA18887@openwall.com>
Date: Mon, 24 Oct 2011 12:28:23 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Benchmarks vs GCC version

On Mon, Oct 10, 2011 at 07:21:24PM -0400, Erik Winkler wrote:
> On Oct 10, 2011, at 6:19 PM, Solar Designer wrote:
> 
> > On Mon, Oct 10, 2011 at 04:48:08PM -0400, Erik Winkler wrote:
> >> Why is the gcc 4.2.1 code significantly faster than the 4.6.1 code?  Is it because of the SSE2 intrinsics code used with gcc 4.6.1?
> > 
> > Maybe, but this was not supposed to be the case.  I might do some more
> > testing with gcc 4.6.1 specifically.  However, in current CVS tree, the
> > use of intrinsics has been disabled for another reason anyway.  You can
> > try this trivial patch too:
> 
> Yes, that was the issue.  See new benchmarks below using gcc 4.6.1 after I applied the patch to x86-64.h.

Thanks!  So, I tested with gcc 4.5.0 vs. 4.6.1 now.  I confirm that
there's a 25% slowdown for the SSE2 intrinsics code when going from
4.5.0 to 4.6.1.  To partially cure it, add -fno-unit-at-a-time to
OPT_INLINE.  Apparently, gcc 4.6.x just tries too hard to optimize those
functions with the S-box functions inlined into them, and it fails at
that. :-(  With gcc 4.5.0, adding this option makes little difference.

Of course, switching to hand-written assembly code is another valid
cure, but for OpenMP builds we currently/still use the intrinsics.  So I
think I'll have to add -fno-unit-at-a-time to OPT_INLINE or to proposed
OMPFLAGS (for gcc) in the next release.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.