Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 8 Aug 2012 17:48:55 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: crypt* files in crypt directory

On Wed, Aug 08, 2012 at 09:03:00AM +0200, Daniel Cegiełka wrote:
> > Maybe you could support -DFAST_CRYPT or the like.  It could enable
> > forced inlining and manual unrolls in crypt_blowfish.c.
> >
> > Alexander
> 
> This can be a very sensible solution.

Unless there's a really compelling reason to do so, I'd like to avoid
having multiple alternative versions of the same code in a codebase.
It makes it so there's more combinations you have to test to be sure
the code works and doesn't have regressions.

As it stands, the code I posted with the manual unrolling removed
performs _better_ than the manually unrolled code with gcc 4 on x86_64
when optimized for speed, and it's 33% smaller when optimized for
size.

As for being slower on gcc 3, there's already much more
performance-critical code that's significantly slower on musl+gcc3
than on glibc due to gcc3 badness, for example all of the
endianness-swapping functions (byteswap.h and htonl,etc. in netdb.h).
Really the only place where crypt performance is critical is in JtR,
and there you're using your own optimized code internal to JtR, right?
Even if crypt is half-speed on gcc3 without the manual unrolling, that
still only makes a 1-order-of-magnitude (base 2) difference to the
iterations you can use while keeping the same responsiveness/load,
i.e. not nearly enough to make or break somebody's ability to crack
your hashes. (In general, as long as you don't try to iterate this
principle, an attacker who can afford N time (or N cores) can also
afford 2*N time (or 2*N cores).)

Aside from my own feelings on the matter, I'm trying to consider the
impressions it makes on our user base. I've already taken some heat
for replacing the heap sort qsort code in musl with smoothsort,
despite it being a lot faster in a task where performance is generally
important, and the size difference was less than this crypt unrolling.
When someone frustrated with bloat sees hand-unrolled loops, their
first reaction is "eew, this code is bloated". My intent with
modernizing (and fixing the stack usage) of the old DES crypt code was
to save enough space that we could get the new algorithms (blowfish,
md5, sha) integrated without much (or even any, if possible) size
increase versus the old bad DES code. I think this makes a difference
for "selling" the idea of supporting all these algorithms to the
anti-bloat faction of musl's following.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.