Date: Wed, 2 Sep 2015 15:02:46 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU Regarding my testing with md5crypt-opencl: On Sat, Aug 29, 2015 at 09:48:48AM +0300, Solar Designer wrote: > Per this recent discussion, not inlining of functions isn't supported in > AMD OpenCL currently: > > https://community.amd.com/thread/170309 > > So I am puzzled why I appeared to have any performance difference from > including or omitting the "inline" keyword on md5_digest(). I'll need > to re-test this, preferably reviewing the generated code. I just did. The generated GCN ISA code is exactly the same regardless of whether I use the "inline" keyword on md5_digest() or not. The function is inlined for its every use either way. And the code size is around 16000 bytes (some instructions are 4-byte, some are 8-byte). I also tried unrolling md5crypt's loop to a ridiculous extent, like literally fully unrolling it - "#pragma unroll 500" before the "for (i = 0; i < 500; i++)" loop with two uses of md5_digest() in it. The resulting GCN ISA code size is: [solar@...er run]$ fgrep -i codelen _temp_0_Tahiti_cryptmd5.isa codeLenInByte = 2809236 bytes; but the performance is only about 40% worse than for the 16k version. So in some cases GCN tolerates exceeding L1i cache surprisingly well. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.