john-dev - Re: AMD Bulldozer and XOP

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120315132453.GD7203@openwall.com>
Date: Thu, 15 Mar 2012 17:24:53 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: AMD Bulldozer and XOP

On Thu, Mar 15, 2012 at 11:46:09AM +0200, Milen Rangelov wrote:
> [...] I did some more
> improvements (like e.g using SSE3 shuffle to speed up the byte order
> reversals in SHA1 and optimizing a bit the early checks).

Sounds cool.  You got to start contributing to JtR more directly. ;-)

> Well, I tried using 128-bit xmm instructions, not the ymm version (which
> would require a lot of rework).

Oh, you use explicit asm, not intrinsics?  Does XOP even offer
bit-select instructions for XMM registers?  I thought it only added
VPCMOV, which operates on YMM registers.  Or do you mix XMM/YMM?

> Speed with both the Kwan's sboxes/SSE2 and
> Roman's with XOP is about 5M c/s (6 threads). I believe the problem is
> somewhere else (my key/block setup to blame perhaps).

Maybe you unrolled the loop too much and it doesn't fit in L1 cache?

With inlined S-boxes, we're only able to unroll 2 rounds (so we get 8
loop iterations per DES block encryption).

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.