Date: Mon, 24 Aug 2015 13:05:04 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU On Sun, Aug 23, 2015 at 11:02:24AM +0300, Solar Designer wrote: > I think this also serves to illustrate why working with 32-bit values or > vector elements at OpenCL source level is a safer bet... although then > we'd need to find and use the right intrinsics for funnel shift in > OpenCL. AMD has it as amd_bitalign(), but I don't know if NVIDIA has an > equivalent now (maybe the same funnel shift intrinsics names as they use > in CUDA?) The CUDA intrinsics don't appear to exist in OpenCL, not even after #include'ing the corresponding CUDA header file (it got parsed as OpenCL fine, but didn't result in the intrinsics becoming available). However, inline PTX asm is available in OpenCL, and this is how I made use of the funnel shifter in the patch for md5crypt-opencl that I've just posted. (The funnel shifter was already in use by rotate(), but in that patch I also used it to implement md5crypt's unaligned writes.) Normally, tiny pieces of inline asm hurt the compiler's instruction scheduling and thus are rarely a good idea, but in NVIDIA's case there's hopefully sufficient rescheduling in the PTX to native ISA translation. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.