john-dev - Re: PHC: Argon2 on GPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGDhHUq6d33nGTFT65MdrzMMYodoZyHRHeZmmvTbTRRFJfmHQ@mail.gmail.com>
Date: Sun, 16 Aug 2015 22:27:27 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

2015-08-16 16:09 GMT+02:00 Solar Designer <solar@...nwall.com>:
> On Sun, Aug 16, 2015 at 03:03:56PM +0200, Agnieszka Bielec wrote:
>> 2015-08-16 14:48 GMT+02:00 Solar Designer <solar@...nwall.com>:
>> > On Sun, Aug 16, 2015 at 02:01:38PM +0200, Agnieszka Bielec wrote:
>> >>         prev_block[i] = ((__global ulong2*)memory)[bi+i];
>> >> }
>> >>
>> >> see anyone some logic here or is this just a bug on AMD?
>> >
>> > Why do you call this a bug?  It isn't necessarily a bug when performance
>> > of code changes when you change the source code.
>> >
>> > Anyway, it looks like in the second code version you rely on address
>> > scaling by 16, and this is probably not available in the architecture
>> > (usually available is scaling by up to 8), so requires extra
>> > instructions (explicit left shifts).
>>
>> where do you see address scaling?
>
> bi+i is used to index an array if 16-byte elements, so it needs to be
> multiplied by 16 each time (unless the compiler manages to optimize
> this, perhaps much like you had done manually in the first version).

if something is not supported why I have on my laptop the opposite of
this slowdown on AMD? although only slightly, when I modify the newest
code as  in my previous e-mail I have the same speed for tests when
gws is set and when gws is not set

none@...e ~/Desktop/r/run $ ./john --test --format=argon2d-opencl
Benchmarking: argon2d-opencl [Blake2 OpenCL]...
memory per hash : 1.46 MB
Device 0: GeForce GTX 960M
using different password for benchmarking
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1
Many salts:     4114 c/s real, 4114 c/s virtual
Only one salt:  4114 c/s real, 4151 c/s virtual

none@...e ~/Desktop/r/run $ GWS=512 ./john --test --format=argon2d-opencl
Benchmarking: argon2d-opencl [Blake2 OpenCL]...
memory per hash : 1.46 MB
Device 0: GeForce GTX 960M
using different password for benchmarking
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1
Many salts:     4055 c/s real, 4096 c/s virtual
Only one salt:  4096 c/s real, 4055 c/s virtual

after this modification:

none@...e ~/Desktop/r/run $ ./john --test --format=argon2d-opencl
Benchmarking: argon2d-opencl [Blake2 OpenCL]...
memory per hash : 1.46 MB
Device 0: GeForce GTX 960M
using different password for benchmarking
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1
Many salts:     4055 c/s real, 4055 c/s virtual
Only one salt:  4055 c/s real, 4015 c/s virtual

none@...e ~/Desktop/r/run $ GWS=512 ./john --test --format=argon2d-opencl
Benchmarking: argon2d-opencl [Blake2 OpenCL]...
memory per hash : 1.46 MB
Device 0: GeForce GTX 960M
using different password for benchmarking
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1
Many salts:     4055 c/s real, 4096 c/s virtual
Only one salt:  4055 c/s real, 4055 c/s virtual
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.