Date: Thu, 22 Mar 2012 11:10:15 +0200 From: Milen Rangelov <gat3way@...il.com> To: john-dev@...ts.openwall.com Subject: Re: CUDA & OpenCL status Hello Alexander, Milen - any additional info on that "fused SHL+ADD instruction" on > Nvidia and its use for MD5 and the like? I don't immediately see how > such an instruction would be usable there because we actually need > rotate+ADD. > Problem is that NVidia does not have the rotate. AMD has BITALIGN_INT that does it. With NVidia, rotate() would actually do (a<<s)|(a>>(32-s)). With Fermi you can have the SHL+ADD thing (which I guess is just a 32-bit MAD in fact), but rotate() still does not do the trick. What I do is something like: #define ROTATE (a<<S)+(a>>(32-s)) Which makes the compiler emit the needed instructions. Of course that's not as good as having a single "rotate" instruction, but still doing 2 bitwise ops is better than doing 3. This works only on sm_2x architectures, so it really does not matter on say 9800GT. Just a side note, you don't need to explicitly use amd_bitalign from the cl_amd_media_ops extensions as rotate() maps to bitalign since SDK 2.3 or even before that. Since recently (Catalyst 12.2), bitselect() is mapped to BFI_INT too. Thus no need to do the binary patching to get bfi working. Regards. Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.