Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 21 Aug 2015 06:44:49 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

Agnieszka,

Per slide 37 of:

http://humus.name/Articles/Persson_LowlevelShaderOptimization.pdf

integer division isn't that slow on AMD GCN.  This says 40 cycles.
Although these slides do call it "horrible", it isn't as bad as I
thought it could be.

On Wed, Aug 19, 2015 at 07:12:55PM +0200, Agnieszka Bielec wrote:
> it's slower with wrap instead of %
> I just changed x % y to number 5 and I gained speed only on my 960m
> from 1861 to 1878 (argon2i). I will check again % after another
> optimizations

It is moderately surprising that changing this to a constant like 5
doesn't speed things up through the improved locality of reference to
global memory resulting from accessing just that one block each time.
Is this because the global memory caches are being constantly thrashed
anyway, even when loading the same blocks over and over, due to the
sheer number of Argon2 instances being computed concurrently?  Perhaps.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.