john-dev - Re: PHC: Argon2 on GPU

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150824100504.GA23429@openwall.com>
Date: Mon, 24 Aug 2015 13:05:04 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

On Sun, Aug 23, 2015 at 11:02:24AM +0300, Solar Designer wrote:
> I think this also serves to illustrate why working with 32-bit values or
> vector elements at OpenCL source level is a safer bet... although then
> we'd need to find and use the right intrinsics for funnel shift in
> OpenCL.  AMD has it as amd_bitalign(), but I don't know if NVIDIA has an
> equivalent now (maybe the same funnel shift intrinsics names as they use
> in CUDA?)

The CUDA intrinsics don't appear to exist in OpenCL, not even after
#include'ing the corresponding CUDA header file (it got parsed as OpenCL
fine, but didn't result in the intrinsics becoming available).

However, inline PTX asm is available in OpenCL, and this is how I made
use of the funnel shifter in the patch for md5crypt-opencl that I've
just posted.  (The funnel shifter was already in use by rotate(), but in
that patch I also used it to implement md5crypt's unaligned writes.)

Normally, tiny pieces of inline asm hurt the compiler's instruction
scheduling and thus are rarely a good idea, but in NVIDIA's case there's
hopefully sufficient rescheduling in the PTX to native ISA translation.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.