Date: Mon, 9 Aug 2010 09:19:30 +0400 From: Solar Designer <solar@...nwall.com> To: john-users@...ts.openwall.com Subject: Re: macosx-x86-64 and macosx-x86-sse2 targets Hi Eric, On Sat, Aug 07, 2010 at 07:54:44PM -0700, Eric Christopher wrote: > After the benchmark runs of gcc vs llvm-gcc done by the "Phoronix" benchmark guys I looked into the performance difference they spotted with md5 and blowfish via john. It turns out that it comes down to the scheduling decisions made by both gcc and llvm-gcc. gcc has a scheduler that schedules for functional units whereas llvm schedules more for register pressure - unfortunately the code in both blowfish and md5 is already hand scheduled so all we do is muck it up (though we do have a lot less stack spills) and impact ILP in the core loops. Thank you for figuring this out, coming up with a workaround, testing, and sharing it with the John user community! I appreciate this. > A workaround for this is to add -fno-schedule-insns to the two targets in the compile options. I've tested this with both the gcc available on OS X and llvm-gcc and it doesn't impact performance with gcc and gives a pretty large performance increase with llvm-gcc. I am uncomfortable about including this workaround in the default Makefile. I find it likely that gcc's instruction scheduling benefits some of the hash types (especially those supported with the jumbo patch, the source code for many of which is largely unoptimized). Additionally, some CPUs may benefit from gcc rescheduling the operations compared to their ordering in the C source (when gcc is optimizing for a particular and proper CPU family). The "hand scheduling" in the C source was never meant to propagate to the compiled code as-is. It was just a way of thinking while editing the source code - to see whether sufficient parallelism was available or not (and make adjustments to make more parallelism available if needed and possible). I always expected the compiler to reschedule as needed for the target CPU family. In some other source files, no "hand scheduling" like this is done. For example, the bitslice DES S-box expression files (sboxes*.c, nonstd.c) were generated such that the result of one operation is very often used by the immediately following one, even though there's plenty of parallelism available. The compiler is expected to allocate registers and reschedule. Ideally, the issue should be fixed in llvm-gcc. Meanwhile, it'd be great to be able to specify compiler options at the beginning of a source file, inside an #if on particular compiler version(s). Trying to do the same in a Makefile or at an even higher level introduces "unneeded" complexity, so I am uncomfortable about doing that. But I could use this inside-source-file compiler options feature (non-existent in gcc and llvm-gcc?) I wished this were possible on many occasions. Thanks again, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.