|
|
Message-ID: <20150823080224.GA16570@openwall.com>
Date: Sun, 23 Aug 2015 11:02:24 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU
On Sun, Aug 23, 2015 at 10:21:53AM +0300, Solar Designer wrote:
> For starters, we should try OpenCL's rotate() and see if it translates
> to decent PTX code these days. Right now, opencl_blake2.h: rotr64()
> does not yet use rotate(), while opencl_blake2-round-no-msg.h does. We
> should switch both to use the same approach, at least to make reviewing
> the generated PTX code easier.
The body of rotr64() should be:
return rotate(w, (ulong)(64 - c));
(or we can turn it into a macro, not to rely on the inlining).
Unfortunately, when we're dealing with 64-bit types, the generated PTX
code includes extra mov's:
{
.reg .b32 %dummy;
mov.b64 {%r15,%dummy}, %rd82;
}
{
.reg .b32 %dummy;
mov.b64 {%dummy,%r16}, %rd82;
}
shf.r.wrap.b32 %r17, %r16, %r15, 24;
shf.r.wrap.b32 %r18, %r15, %r16, 24;
These are simply to extract the 32-bit halves as needed for the shf
instructions. The mov's should be gone and proper registers
substituted right into the shf instructions in the final ISA code -
however, I am not sure this is what is actually happening (depends on
how good the translator from PTX to native ISA is).
I think this also serves to illustrate why working with 32-bit values or
vector elements at OpenCL source level is a safer bet... although then
we'd need to find and use the right intrinsics for funnel shift in
OpenCL. AMD has it as amd_bitalign(), but I don't know if NVIDIA has an
equivalent now (maybe the same funnel shift intrinsics names as they use
in CUDA?)
Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.