Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 15 Mar 2012 23:24:07 +0200
From: Milen Rangelov <gat3way@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: AMD Bulldozer and XOP

Hello :)


> It got plenty of buggy code already, adding some more wouldn't hurt. ;-)
> Seriously, though, I think many contributions are useful as PoCs - then
> other contributors to the project may re-code things in a cleaner way.


I can help with some kernels. In fact, JtR is very inspiring project. I
like to look at how people solved similar problems often in different ways.



> I'm sorry, that comment of mine was wrong and misleading.  For a moment
> I confused VEX-encoding vs. not with XMM vs. YMM registers.  Indeed,
> VPCMOV exists for XMM registers as well, and in fact this is what JtR
> normally uses in -xop builds (except in 32-bit x86 builds, where it
> tries to use 256-bit AVX and XOP to compensate for the low register
> count, which in practice turned out to be beneficial on Sandy Bridge,
> but not on Bulldozer - so I am going to stop doing that for -xop).
>
>
So 256-bit XOP is slower than 128-bit one? This reminds me of SSE2 and some
old Pentium 4 CPUs :)



> Yeah, I was surprised to hear you got into the CPU stuff at all - I
> thought you were a GPU guy. ;-)
>
>
Not exactly. I like GPGPU stuff though, it's still easy and predictable and
very challenging at times. It's also much more straightforward - you get
what you see. When coding CPU stuff with performance in mind, there are
just so many factors and the x86 architecture itself is a monster :)


> Like I said above, the 256-bit bitwise ops are currently slow.  On Sandy
> Bridge, they deliver about the same per-bit speed that the 128-bit ones
> do, and on Bulldozer they appear to be slower.  Things should change
> with future CPUs, though - especially with those supporting AVX2 (which
> officially gives us 256-bit integer vectors).
>

That's bad. I expected some decent speedup by using 256-bit vectors. Looks
like I should wait until Haswell (or whatever the first AVX2 chip is) comes
out...

Regards

[ CONTENT OF TYPE text/html SKIPPED ]

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ