Date: Fri, 21 Aug 2015 06:44:49 +0300 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU Agnieszka, Per slide 37 of: http://humus.name/Articles/Persson_LowlevelShaderOptimization.pdf integer division isn't that slow on AMD GCN. This says 40 cycles. Although these slides do call it "horrible", it isn't as bad as I thought it could be. On Wed, Aug 19, 2015 at 07:12:55PM +0200, Agnieszka Bielec wrote: > it's slower with wrap instead of % > I just changed x % y to number 5 and I gained speed only on my 960m > from 1861 to 1878 (argon2i). I will check again % after another > optimizations It is moderately surprising that changing this to a constant like 5 doesn't speed things up through the improved locality of reference to global memory resulting from accessing just that one block each time. Is this because the global memory caches are being constantly thrashed anyway, even when loading the same blocks over and over, due to the sheer number of Argon2 instances being computed concurrently? Perhaps. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.