Date: Wed, 8 Aug 2012 10:27:06 +0400 From: Solar Designer <solar@...nwall.com> To: musl@...ts.openwall.com Subject: Re: crypt* files in crypt directory On Wed, Aug 08, 2012 at 01:28:44AM -0400, Rich Felker wrote: > On Wed, Aug 08, 2012 at 08:42:35AM +0400, Solar Designer wrote: > > I see that you did this - and I think you took it too far. The code > > became twice slower on Pentium 3 when compiling with gcc 3.4.5 (approx. > > 140 c/s down to 77 c/s). Adding -finline-functions > > -fold-unroll-all-loops regains only a fraction of the speed (112 c/s); > > less aggressive loop unrolling results in lower speeds. > > Can you compare with a more modern gcc? I could and I might do that later, but to me the slowdown with gcc 3 is enough reason not to make those changes in that specific way. > > The impact on x86-64 is less. With Ubuntu 12.04's gcc 4.6.3 on FX-8120 > > I get 490 c/s for the original code, 450 c/s for your code without > > inlining/unrolling, and somehow only 430 c/s with -finline-functions > > -funroll-loops. > > Actually this is a lot closer to what I expected. I think you'll find > similar results on 32-bit with gcc 4.6.3 too. The modern expectation > is that manually unrolling loops will give worse performance than > letting the compiler decide what to do. Certainly there are exceptions > to the expected result, but on average, it's the right decision. Per the numbers above, here the compiler's unroll is slower not only than manual unroll, but also than non-unrolled code. > Even if it's twice as slow, that should only be the cost of > incrementing the (logarithmic) iteration count by one). Yes, and I think this is significant. > The size difference between the versions is roughly 50% It doesn't have to be. There are 6 instances of BF_ENCRYPT in BF_crypt(). I am only asking you to revert to their larger form the two that are inside BF_body. The remaining 4 may remain as calls to a function. Alternatively, all 6 may be function calls, but then the function's BF_ENCRYPT should be a fully manually unrolled one. I am not sure which of these options will be faster overall for typical settings (we'd need to benchmark these at $2a$08). > (7k vs 11.5k with -Os > and roughly 9k vs 13.5k with -O3). Yes one can argue that the > difference doesn't matter for one particular component they especially > care about, Exactly. > but everyone cares about something different, and in the > end the whole library ends up 50% larger if you follow that to its > logical end. Makes sense. > I'd much rather stick with letting the compiler do the > bloating-up for performance purposes if the user wants it, so that > the choice is left to them. Maybe you could support -DFAST_CRYPT or the like. It could enable forced inlining and manual unrolls in crypt_blowfish.c. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.