Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 19 Sep 2012 20:34:38 +0200
From: magnum <>
Subject: Re: 1.7.9-jumbo-7

It's an Intel CPU (quad i7), so it's just the llvm compiler that can't compete with gcc (let alone icc) when compiling the intrinsics. For some reason I can't use an -x86-64i build either (llvm can't seem to compile the assembler). I'll look into that some day.

I can switch to gcc whenever I want and everything will be normal, but I wanted to squeeze out most of the "native OSX" bugs now that I have a chance.


On 19 Sep, 2012, at 19:31 , jfoug <> wrote:

> Btw, any idea on why the code which uses the circular temp buffer was slower
> on this system?  The code I wrote does use just a touch more CPU to roll the
> temp vars through a circular buffer, but I would think the memory savings
> would more than make up for that, especially (IIRC), since the last few
> loops do not write back to memory since it will never be accessed.  Possibly
> this system simply has a very tiny L1 cache or something, where the memory
> stall reduction does not offset the CPU overhead.
> Jim.
>> From: magnum []
>> What you write is true in general, but the bug in question was not about
>> that: The SHA_BUF_SIZ I'm talking about only exist in JtR's own sse-
>> intrinsics.c code. It's set to 80 for Simon's original 80x4 buffer SSE2
>> SHA-1, and 16 for your later 16x4 code that use buffers similar to MD4
>> and MD5 (except for endianness). Your code is faster on every platform I
>> have tried except OSX w/ llvm. So I modifed x86-64.h to use 80 for these
>> builds - and that triggered the bug!

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ