Date: Thu, 15 Mar 2012 16:08:31 +0200 From: Milen Rangelov <gat3way@...il.com> To: john-dev@...ts.openwall.com Subject: Re: AMD Bulldozer and XOP > Sounds cool. You got to start contributing to JtR more directly. ;-) > > Hehe I believe jtr would be better off without my buggy code inside :) Oh, you use explicit asm, not intrinsics? Does XOP even offer > bit-select instructions for XMM registers? I thought it only added > VPCMOV, which operates on YMM registers. Or do you mix XMM/YMM? > > No, I am not a big fan of having hand-written assembly. I used the VPCMOV intrinsic (_mm_cmov_si128) and it operates on xmm registers. I was not aware there is 256-bit version though, hm need to check that.. Actually I was not aware about that instruction at all, I just knew they have bitwise rotation. I was very pleasantly surprised when I looked at the XOP intrinsics list in MSDN and found that one out. It was like "hey wtf they have the vec_sel thing, great!". I spent too much time playing with GPUs recently, looks like there were some interesting news on the CPU front that I've missed :) > Maybe you unrolled the loop too much and it doesn't fit in L1 cache? > > With inlined S-boxes, we're only able to unroll 2 rounds (so we get 8 > loop iterations per DES block encryption). > Not quite sure. I need to profile that more and see where is the problem. I did not have enough time to play with this yesterday and I was very sleepy. Overall, I suspect the issue might be somewhere else, not the bitslice DES code. Regards Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.