Date: Sun, 16 Aug 2015 16:30:23 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-16 16:09 GMT+02:00 Solar Designer <solar@...nwall.com>: > On Sun, Aug 16, 2015 at 03:03:56PM +0200, Agnieszka Bielec wrote: >> 2015-08-16 14:48 GMT+02:00 Solar Designer <solar@...nwall.com>: >> > On Sun, Aug 16, 2015 at 02:01:38PM +0200, Agnieszka Bielec wrote: >> >> prev_block[i] = ((__global ulong2*)memory)[bi+i]; >> >> } >> >> >> >> see anyone some logic here or is this just a bug on AMD? >> > >> > Why do you call this a bug? It isn't necessarily a bug when performance >> > of code changes when you change the source code. >> > >> > Anyway, it looks like in the second code version you rely on address >> > scaling by 16, and this is probably not available in the architecture >> > (usually available is scaling by up to 8), so requires extra >> > instructions (explicit left shifts). >> >> where do you see address scaling? > > bi+i is used to index an array if 16-byte elements, so it needs to be > multiplied by 16 each time (unless the compiler manages to optimize > this, perhaps much like you had done manually in the first version). ok >> bi is uint and even before /16 is >> BLOCK_SIZE which is much bigger than 16 and divisible by 16 so > > How is this relevant? > >> preprocessor will change this to *[single value] > > I don't see any preprocessor macros here, on the line I quoted above. not relevant anymore > What branch is this committed on, so that I can take a look in context? bleeding-jumbo
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.