musl - Re: crypt_blowfish integration, optimization

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120809222103.GA29365@openwall.com>
Date: Fri, 10 Aug 2012 02:21:03 +0400
From: Solar Designer <solar@...nwall.com>
To: musl@...ts.openwall.com
Subject: Re: crypt_blowfish integration, optimization

On Thu, Aug 09, 2012 at 05:46:54PM -0400, Rich Felker wrote:
> I've taken this version and made some minimum changes based on my
> version, mainly for integration with musl where I'm testing it. I also
> think we've reached the final word on loop unrolling:
> 
> Just For Fun, I tried replacing your unrolled BF_ROUND loop with a for
> loop and compiling with -O3 on gcc 4.6.3. After noticing the
> performance numbers were coming out near-identical, and that the .o
> sizes were mysteriously identical, I decided, Just For Fun, to
> disassemble both versions with objdump and diff them. They are
> identical. That is, modern gcc generates byte-for-byte identical code
> with -O3 for the manually unrolled loop and the for loop.

What about -O2?

-O3 is probably not what will be used for most musl builds, is it?

Hmm, for me "gcc -Q -O2 --help=optimizers" and ditto for -O3 both show
"disabled" for -funroll-loops.  Why was the loop unrolled for you?
Did you also have -funroll-loops specified explicitly?  If so, does this
happen for normal musl builds?  I guess not?

As discussed, the problem with avoiding such hand-unrolls is that the
compiler doesn't know just which loops are most important to unroll.

BTW, what speeds are you getting on your Atom?  How does this compare to
the original crypt_blowfish-1.2 with asm code (both on 32-bit)?

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.