Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 16 Aug 2015 15:48:20 +0300
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

On Sun, Aug 16, 2015 at 02:01:38PM +0200, Agnieszka Bielec wrote:
> now I was digging in argon2d ( I discovored that this bug occurs after
> commit 9e96f452350c0f2cae32b38e4a4cd1f83d51a367)
> and before this commit was code:
> 
> bi = prev_block_offset = ((prev_slice * lanes + pos.lane + 1) *
> segment_length - 1) * BLOCK_SIZE;
> for (i = 0; i < 64; i++)
> {
>        prev_block[i] = *(__global ulong2 *) (&memory[bi]);
>        bi += 16;
> }
> 
> slowdown on AMD occurs when I changed this code to:
> 
> bi = prev_block_offset = ((prev_slice * lanes + pos.lane + 1) *
> segment_length - 1) * BLOCK_SIZE / 16;
> for (i = 0; i < 64; i++)
> {
>         prev_block[i] = ((__global ulong2*)memory)[bi+i];
> }
> 
> see anyone some logic here or is this just a bug on AMD?

Why do you call this a bug?  It isn't necessarily a bug when performance
of code changes when you change the source code.

Anyway, it looks like in the second code version you rely on address
scaling by 16, and this is probably not available in the architecture
(usually available is scaling by up to 8), so requires extra
instructions (explicit left shifts).

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ