Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 8 Aug 2012 01:28:44 -0400
From: Rich Felker <>
Subject: Re: crypt* files in crypt directory

On Wed, Aug 08, 2012 at 08:42:35AM +0400, Solar Designer wrote:
> On Tue, Aug 07, 2012 at 10:24:21PM -0400, Rich Felker wrote:
> > First, the compatibility code for the sign extension bug. How
> > important is it to keep this?
> Not very important, but nice to keep musl's code revision closer to
> upstream.
> [...]
> > I'm uncertain whether there's any portion of musl's user base that
> > this would be useful to.
> Maybe not.

After further reading, the cost is near zero. The compat hack is done
at the same time useful data is being computed. I see no reason to
disable/remove this feature unless the goal is to force people to stop
using old hashes that are likely-vulnerable.

> > Second, what can be done to reduce size?
> I felt the size was acceptable already.  However, if you must, the
> instances of BF_ENCRYPT that are outside of BF_body may be made slower
> with little impact on overall speed.  For example, they may be made a
> function rather than a macro, and the function would only be inlined in
> builds optimized for speed rather than size.
> > I think the first step is
> > replacing the giant macros (BF_ROUND, BF_ENCRYPT, etc.) with
> > functions so that the code doesn't get generated in duplicate unless
> > aggressive inlining is enabled by CFLAGS.
> I see that you did this - and I think you took it too far.  The code
> became twice slower on Pentium 3 when compiling with gcc 3.4.5 (approx.
> 140 c/s down to 77 c/s).  Adding -finline-functions
> -fold-unroll-all-loops regains only a fraction of the speed (112 c/s);
> less aggressive loop unrolling results in lower speeds.

Can you compare with a more modern gcc? 3.x is known to be horrible at
optimizing. It can't even peephole-optimize bswaps.

> The impact on x86-64 is less.  With Ubuntu 12.04's gcc 4.6.3 on FX-8120
> I get 490 c/s for the original code, 450 c/s for your code without
> inlining/unrolling, and somehow only 430 c/s with -finline-functions
> -funroll-loops.

Actually this is a lot closer to what I expected. I think you'll find
similar results on 32-bit with gcc 4.6.3 too. The modern expectation
is that manually unrolling loops will give worse performance than
letting the compiler decide what to do. Certainly there are exceptions
to the expected result, but on average, it's the right decision.

> I think you should revert the changes for the instance of BF_ENCRYPT
> that is inside of BF_body.
> I also think that this code should be optimized for speed even when the
> rest of musl is optimized for size.  In this case, better speed may mean
> better security, because it lets the sysadmin configure a higher
> iteration count for new passwords.

Even if it's twice as slow, that should only be the cost of
incrementing the (logarithmic) iteration count by one). The size
difference between the versions is roughly 50% (7k vs 11.5k with -Os
and roughly 9k vs 13.5k with -O3). Yes one can argue that the
difference doesn't matter for one particular component they especially
care about, but everyone cares about something different, and in the
end the whole library ends up 50% larger if you follow that to its
logical end. I'd much rather stick with letting the compiler do the
bloating-up for performance purposes if the user wants it, so that
the choice is left to them.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.