Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 24 Oct 2011 12:53:37 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Benchmarks vs GCC version

On Mon, Oct 24, 2011 at 12:28:23PM +0400, Solar Designer wrote:
> [...] I tested with gcc 4.5.0 vs. 4.6.1 now.  I confirm that
> there's a 25% slowdown for the SSE2 intrinsics code when going from
> 4.5.0 to 4.6.1.  To partially cure it, add -fno-unit-at-a-time to
> OPT_INLINE.  Apparently, gcc 4.6.x just tries too hard to optimize those
> functions with the S-box functions inlined into them, and it fails at
> that. :-(  With gcc 4.5.0, adding this option makes little difference.

I made an error in my testing.  What really made the difference for gcc
4.6.1 was disabling MAYBE_INLINE inside DES_bs_b.c.  Unfortunately, it
still has performance impact of roughly 10% compared to gcc 4.5.0's code.
(Forced inlining was there for a reason.)

> Of course, switching to hand-written assembly code is another valid
> cure, but for OpenMP builds we currently/still use the intrinsics.  So I
> think I'll have to add -fno-unit-at-a-time to OPT_INLINE or to proposed
> OMPFLAGS (for gcc) in the next release.

Surprisingly, it turns out that with -fopenmp, gcc 4.6.1 produces good
code as-is, with no changes needed.  In fact, with gcc 4.6.1, I am
getting better speed with -fopenmp and OMP_NUM_THREADS=1 (just for
testing) than I do without -fopenmp.

I think we need to find the right -f* option flipping which would cure
performance for non-OpenMP builds, then report this regression to gcc
developers.  It could be some option implied by -fopenmp, or maybe that
effect is from generated code changes in the OpenMP build.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ