Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 16 Aug 2015 17:09:16 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

On Sun, Aug 16, 2015 at 03:03:56PM +0200, Agnieszka Bielec wrote:
> 2015-08-16 14:48 GMT+02:00 Solar Designer <solar@...nwall.com>:
> > On Sun, Aug 16, 2015 at 02:01:38PM +0200, Agnieszka Bielec wrote:
> >>         prev_block[i] = ((__global ulong2*)memory)[bi+i];
> >> }
> >>
> >> see anyone some logic here or is this just a bug on AMD?
> >
> > Why do you call this a bug?  It isn't necessarily a bug when performance
> > of code changes when you change the source code.
> >
> > Anyway, it looks like in the second code version you rely on address
> > scaling by 16, and this is probably not available in the architecture
> > (usually available is scaling by up to 8), so requires extra
> > instructions (explicit left shifts).
> 
> where do you see address scaling?

bi+i is used to index an array if 16-byte elements, so it needs to be
multiplied by 16 each time (unless the compiler manages to optimize
this, perhaps much like you had done manually in the first version).

> bi is uint and even before /16 is
> BLOCK_SIZE which is much bigger than 16 and divisible by 16 so

How is this relevant?

> preprocessor will change this to *[single value]

I don't see any preprocessor macros here, on the line I quoted above.

What branch is this committed on, so that I can take a look in context?

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.