john-dev - Re: AMD Bulldozer and XOP

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120315142442.GA7795@openwall.com>
Date: Thu, 15 Mar 2012 18:24:42 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: AMD Bulldozer and XOP

On Thu, Mar 15, 2012 at 04:08:31PM +0200, Milen Rangelov wrote:
> Hehe I believe jtr would be better off without my buggy code inside :)

It got plenty of buggy code already, adding some more wouldn't hurt. ;-)
Seriously, though, I think many contributions are useful as PoCs - then
other contributors to the project may re-code things in a cleaner way.

> > Oh, you use explicit asm, not intrinsics?  Does XOP even offer
> > bit-select instructions for XMM registers?  I thought it only added
> > VPCMOV, which operates on YMM registers.  Or do you mix XMM/YMM?

I'm sorry, that comment of mine was wrong and misleading.  For a moment
I confused VEX-encoding vs. not with XMM vs. YMM registers.  Indeed,
VPCMOV exists for XMM registers as well, and in fact this is what JtR
normally uses in -xop builds (except in 32-bit x86 builds, where it
tries to use 256-bit AVX and XOP to compensate for the low register
count, which in practice turned out to be beneficial on Sandy Bridge,
but not on Bulldozer - so I am going to stop doing that for -xop).

> No, I am not a big fan of having hand-written assembly. I used the VPCMOV
> intrinsic (_mm_cmov_si128) and it operates on xmm registers. I was not
> aware there is 256-bit version though, hm need to check that.. Actually I
> was not aware about that instruction at all, I just knew they have bitwise
> rotation. I was very pleasantly surprised when I looked at the XOP
> intrinsics list in MSDN and found that one out. It was like "hey wtf they
> have the vec_sel thing, great!". I spent too much time playing with GPUs
> recently, looks like there were some interesting news on the CPU front that
> I've missed :)

Yeah, I was surprised to hear you got into the CPU stuff at all - I
thought you were a GPU guy. ;-)

Like I said above, the 256-bit bitwise ops are currently slow.  On Sandy
Bridge, they deliver about the same per-bit speed that the 128-bit ones
do, and on Bulldozer they appear to be slower.  Things should change
with future CPUs, though - especially with those supporting AVX2 (which
officially gives us 256-bit integer vectors).

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.