Date: Mon, 24 Oct 2011 12:28:23 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Benchmarks vs GCC version On Mon, Oct 10, 2011 at 07:21:24PM -0400, Erik Winkler wrote: > On Oct 10, 2011, at 6:19 PM, Solar Designer wrote: > > > On Mon, Oct 10, 2011 at 04:48:08PM -0400, Erik Winkler wrote: > >> Why is the gcc 4.2.1 code significantly faster than the 4.6.1 code? Is it because of the SSE2 intrinsics code used with gcc 4.6.1? > > > > Maybe, but this was not supposed to be the case. I might do some more > > testing with gcc 4.6.1 specifically. However, in current CVS tree, the > > use of intrinsics has been disabled for another reason anyway. You can > > try this trivial patch too: > > Yes, that was the issue. See new benchmarks below using gcc 4.6.1 after I applied the patch to x86-64.h. Thanks! So, I tested with gcc 4.5.0 vs. 4.6.1 now. I confirm that there's a 25% slowdown for the SSE2 intrinsics code when going from 4.5.0 to 4.6.1. To partially cure it, add -fno-unit-at-a-time to OPT_INLINE. Apparently, gcc 4.6.x just tries too hard to optimize those functions with the S-box functions inlined into them, and it fails at that. :-( With gcc 4.5.0, adding this option makes little difference. Of course, switching to hand-written assembly code is another valid cure, but for OpenMP builds we currently/still use the intrinsics. So I think I'll have to add -fno-unit-at-a-time to OPT_INLINE or to proposed OMPFLAGS (for gcc) in the next release. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.