Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 2 Sep 2015 15:02:46 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

Regarding my testing with md5crypt-opencl:

On Sat, Aug 29, 2015 at 09:48:48AM +0300, Solar Designer wrote:
> Per this recent discussion, not inlining of functions isn't supported in
> AMD OpenCL currently:
> 
> https://community.amd.com/thread/170309
> 
> So I am puzzled why I appeared to have any performance difference from
> including or omitting the "inline" keyword on md5_digest().  I'll need
> to re-test this, preferably reviewing the generated code.

I just did.  The generated GCN ISA code is exactly the same regardless
of whether I use the "inline" keyword on md5_digest() or not.  The
function is inlined for its every use either way.  And the code size is
around 16000 bytes (some instructions are 4-byte, some are 8-byte).

I also tried unrolling md5crypt's loop to a ridiculous extent, like
literally fully unrolling it - "#pragma unroll 500" before the
"for (i = 0; i < 500; i++)" loop with two uses of md5_digest() in it.
The resulting GCN ISA code size is:

[solar@...er run]$ fgrep -i codelen _temp_0_Tahiti_cryptmd5.isa
codeLenInByte        = 2809236 bytes;

but the performance is only about 40% worse than for the 16k version.
So in some cases GCN tolerates exceeding L1i cache surprisingly well.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ