john-dev - Re: PHC: Argon2 on GPU

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150902120246.GA22964@openwall.com>
Date: Wed, 2 Sep 2015 15:02:46 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

Regarding my testing with md5crypt-opencl:

On Sat, Aug 29, 2015 at 09:48:48AM +0300, Solar Designer wrote:
> Per this recent discussion, not inlining of functions isn't supported in
> AMD OpenCL currently:
> 
> https://community.amd.com/thread/170309
> 
> So I am puzzled why I appeared to have any performance difference from
> including or omitting the "inline" keyword on md5_digest().  I'll need
> to re-test this, preferably reviewing the generated code.

I just did.  The generated GCN ISA code is exactly the same regardless
of whether I use the "inline" keyword on md5_digest() or not.  The
function is inlined for its every use either way.  And the code size is
around 16000 bytes (some instructions are 4-byte, some are 8-byte).

I also tried unrolling md5crypt's loop to a ridiculous extent, like
literally fully unrolling it - "#pragma unroll 500" before the
"for (i = 0; i < 500; i++)" loop with two uses of md5_digest() in it.
The resulting GCN ISA code size is:

[solar@...er run]$ fgrep -i codelen _temp_0_Tahiti_cryptmd5.isa
codeLenInByte        = 2809236 bytes;

but the performance is only about 40% worse than for the 16k version.
So in some cases GCN tolerates exceeding L1i cache surprisingly well.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.