Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 15 Jun 2012 23:57:34 +0200
From: Simon Marechal <>
Subject: Re: Re: [patch] optional new raw sha1 implemetation

On 06/15/2012 11:36 PM, Tavis Ormandy wrote:
> Oops, good point. I'm not sure how to tell if it's available or not (I
> think it was accidentally ommitted in some gcc releases), but gcc seems
> to tolerate me writing my own, so I did that.
> I'll look into how to do it properly.

I just pushed a "fix" that checks if we are using ICC. It should also
fix the x86-64.S problem.

>> > The current SSE code cracks 19.8M c/s. Taviso's is faster at 21.3M c/s,
>> > and doesn't use the register scheduling trick that is in
>> > sse-intrinsics.c. This _might_ mean it could be faster.
> Nice, that's great news! Solar also mentioned I should read this, I'll
> do that and see if there are any ideas to steal :-)

The idea is to work on N*4 32 bit values at the same time instead of
just 4, and let the compiler schedule the register allocation so that it
hides memory latency and penalties resulting from using a register value
just after it is assigned. The first time I saw it was in BarsWF. GCC
doesn't seem to be good at it however, and that is the reason there is a
x86-64i target with a precompiled .S file from ICC.

Other "easy" gains could be achieved by allocating larger buffers,
increasing MAX_KEY_PER_CRYPT and running the hashing function several
times per crypt_all call.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.