Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 15 Mar 2012 16:08:31 +0200
From: Milen Rangelov <gat3way@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: AMD Bulldozer and XOP

> Sounds cool.  You got to start contributing to JtR more directly. ;-)
>
>
Hehe I believe jtr would be better off without my buggy code inside :)

Oh, you use explicit asm, not intrinsics?  Does XOP even offer
> bit-select instructions for XMM registers?  I thought it only added
> VPCMOV, which operates on YMM registers.  Or do you mix XMM/YMM?
>
>
No, I am not a big fan of having hand-written assembly. I used the VPCMOV
intrinsic (_mm_cmov_si128) and it operates on xmm registers. I was not
aware there is 256-bit version though, hm need to check that.. Actually I
was not aware about that instruction at all, I just knew they have bitwise
rotation. I was very pleasantly surprised when I looked at the XOP
intrinsics list in MSDN and found that one out. It was like "hey wtf they
have the vec_sel thing, great!". I spent too much time playing with GPUs
recently, looks like there were some interesting news on the CPU front that
I've missed :)



> Maybe you unrolled the loop too much and it doesn't fit in L1 cache?
>
> With inlined S-boxes, we're only able to unroll 2 rounds (so we get 8
> loop iterations per DES block encryption).
>

Not quite sure. I need to profile that more and see where is the problem. I
did not have enough time to play with this yesterday and I was very sleepy.
Overall, I suspect the issue might be somewhere else, not the bitslice DES
code.

Regards

Content of type "text/html" skipped

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ