Date: Thu, 15 Mar 2012 23:55:59 +0200 From: Milen Rangelov <gat3way@...il.com> To: john-dev@...ts.openwall.com Subject: Re: AMD Bulldozer and XOP > > I think you mean SSE and Pentium 3. Yes, that was disappointing. In > fact, the cause might be similar: officially, those wider registers and > operations on them are "floating point" (true for both the original SSE > and now for 256-bit AVX and XOP), so there might be some overhead on > updating some CPU-internal floating-point state (flags reflecting the > current values in the vector elements if interpreted as floating-point?) > That's just a guess, though. > > Hm, but they support bitwise operations like shifts, and/or/xor and stuff. I was wondering if it make sense to use that in say a MD5 routine. Load two xmms into an ymm to do bitwise operations, then unload them for the additions, then load them again for the next bitwise operations and so on. Perhaps that's a very stupid idea and I really doubt it would work, but who knows. I've done something like that with my SHA512 kernel, though the case was very different there. GPUs have no native 64-bit operations even though OpenCL standard defines 64-bit long types, all 64-bit arithmetics emulated in software. However the AMD compiler generated horrible code sometimes and it turned beneficial to cast a ulong to uint vector and do the operation on uints. Of course the xmm/ymm case is much different and I don't know if it's applicable here at all. Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.