Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 19 Aug 2015 19:39:24 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

2015-08-19 19:12 GMT+02:00 Agnieszka Bielec <bielecagnieszka8@...il.com>:
> 2015-08-19 6:10 GMT+02:00 Solar Designer <solar@...nwall.com>:
>> (just to illustrate the problem of slow integer division on GPUs).
>>
>> Before you spend a lot of time on this, I suggest that you replace this
>> modulo operation with something simpler (and wrong), yet in some ways
>> similar, e.g.:
>>
>> static inline uint32_t wrap(uint64_t x, uint32_t n)
>> {
>>         uint64_t a = (x + n) & (n - 1);
>>         uint64_t b = x & n;
>>         uint64_t c = (x << 1) & n;
>>         return ((a << 1) + b + c) >> 2;
>> }
>>
>> (and its OpenCL equivalent, with proper data types).  Of course, this
>> revision of Argon2 won't match Argon2's normal test vectors, but you
>> should be able to see roughly what performance you could get if you
>> later optimize the division.
>
> it's slower with wrap instead of %
> I just changed x % y to number 5 and I gained speed only on my 960m
> from 1861 to 1878 (argon2i). I will check again % after another
> optimizations

I checked also argon2d just in case and I have more speedup here

normal code
none@...e ~/Desktop/r/run $ GWS=512 ./john --test --format=argon2d-opencl
Benchmarking: argon2d-opencl [Blake2 OpenCL]...
memory per hash : 1.50 MB
Device 0: GeForce GTX 960M
using different password for benchmarking
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 1536, cost 3 (l) of 1
Many salts:     3976 c/s real, 3938 c/s virtual
Only one salt:  3976 c/s real, 4015 c/s virtual


with 5
none@...e ~/Desktop/r/run $ GWS=512 ./john --test
--format=argon2d-opencl --skip-self-test
Benchmarking: argon2d-opencl [Blake2 OpenCL]...
memory per hash : 1.50 MB
Device 0: GeForce GTX 960M
using different password for benchmarking
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 1536, cost 3 (l) of 1
Many salts:     4055 c/s real, 4055 c/s virtual
Only one salt:  4114 c/s real, 4151 c/s virtual


with wrap()
none@...e ~/Desktop/r/run $ GWS=512 ./john --test
--format=argon2d-opencl --skip-self-test
Benchmarking: argon2d-opencl [Blake2 OpenCL]...
memory per hash : 1.50 MB
Device 0: GeForce GTX 960M
using different password for benchmarking
DONE
Speed for cost 1 (t) of 1, cost 2 (m) of 1536, cost 3 (l) of 1
Many salts:     3976 c/s real, 3976 c/s virtual
Only one salt:  4015 c/s real, 4015 c/s virtual

maybe it's just usual coincidence

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.