Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 15 Mar 2012 17:24:53 +0400
From: Solar Designer <>
Subject: Re: AMD Bulldozer and XOP

On Thu, Mar 15, 2012 at 11:46:09AM +0200, Milen Rangelov wrote:
> [...] I did some more
> improvements (like e.g using SSE3 shuffle to speed up the byte order
> reversals in SHA1 and optimizing a bit the early checks).

Sounds cool.  You got to start contributing to JtR more directly. ;-)

> Well, I tried using 128-bit xmm instructions, not the ymm version (which
> would require a lot of rework).

Oh, you use explicit asm, not intrinsics?  Does XOP even offer
bit-select instructions for XMM registers?  I thought it only added
VPCMOV, which operates on YMM registers.  Or do you mix XMM/YMM?

> Speed with both the Kwan's sboxes/SSE2 and
> Roman's with XOP is about 5M c/s (6 threads). I believe the problem is
> somewhere else (my key/block setup to blame perhaps).

Maybe you unrolled the loop too much and it doesn't fit in L1 cache?

With inlined S-boxes, we're only able to unroll 2 rounds (so we get 8
loop iterations per DES block encryption).


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.