john-users - Re: macosx-x86-64 and macosx-x86-sse2 targets

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20100809051930.GA15201@openwall.com>
Date: Mon, 9 Aug 2010 09:19:30 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: macosx-x86-64 and macosx-x86-sse2 targets

Hi Eric,

On Sat, Aug 07, 2010 at 07:54:44PM -0700, Eric Christopher wrote:
> After the benchmark runs of gcc vs llvm-gcc done by the "Phoronix" benchmark guys I looked into the performance difference they spotted with md5 and blowfish via john.  It turns out that it comes down to the scheduling decisions made by both gcc and llvm-gcc.  gcc has a scheduler that schedules for functional units whereas llvm schedules more for register pressure - unfortunately the code in both blowfish and md5 is already hand scheduled so all we do is muck it up (though we do have a lot less stack spills) and impact ILP in the core loops.

Thank you for figuring this out, coming up with a workaround, testing,
and sharing it with the John user community!  I appreciate this.

> A workaround for this is to add -fno-schedule-insns to the two targets in the compile options.  I've tested this with both the gcc available on OS X and llvm-gcc and it doesn't impact performance with gcc and gives a pretty large performance increase with llvm-gcc.

I am uncomfortable about including this workaround in the default
Makefile.  I find it likely that gcc's instruction scheduling benefits
some of the hash types (especially those supported with the jumbo patch,
the source code for many of which is largely unoptimized).
Additionally, some CPUs may benefit from gcc rescheduling the
operations compared to their ordering in the C source (when gcc is
optimizing for a particular and proper CPU family).

The "hand scheduling" in the C source was never meant to propagate to
the compiled code as-is.  It was just a way of thinking while editing
the source code - to see whether sufficient parallelism was available or
not (and make adjustments to make more parallelism available if needed
and possible).  I always expected the compiler to reschedule as needed
for the target CPU family.

In some other source files, no "hand scheduling" like this is done.  For
example, the bitslice DES S-box expression files (sboxes*.c, nonstd.c)
were generated such that the result of one operation is very often used
by the immediately following one, even though there's plenty of
parallelism available.  The compiler is expected to allocate registers
and reschedule.

Ideally, the issue should be fixed in llvm-gcc.  Meanwhile, it'd be
great to be able to specify compiler options at the beginning of a
source file, inside an #if on particular compiler version(s).  Trying
to do the same in a Makefile or at an even higher level introduces
"unneeded" complexity, so I am uncomfortable about doing that.  But I
could use this inside-source-file compiler options feature (non-existent
in gcc and llvm-gcc?)  I wished this were possible on many occasions.

Thanks again,

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.