john-dev - Re: PHC: Argon2 on GPU

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKGDhHWP3cUzVKNnRQcs7M+znHfvNTKhdEJwQrA1gn4auKd_bA@mail.gmail.com>
Date: Fri, 14 Aug 2015 20:40:28 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

2015-08-14 20:11 GMT+02:00 Solar Designer <solar@...nwall.com>:
> On Fri, Aug 14, 2015 at 08:01:31PM +0200, Agnieszka Bielec wrote:
>> 2015-08-14 19:06 GMT+02:00 Solar Designer <solar@...nwall.com>:
>> > On Fri, Aug 14, 2015 at 07:02:39PM +0200, Agnieszka Bielec wrote:
>> >> ah, In this link is argon2d, it's faster than argon2i because t_cost
>> >> for argon2d is equal to 1, 3 for argon2i
>> >
>> > Sure, but IIRC on other benchmarks you posted there was only a small
>> > difference in performance for 2i at t=3 and 2d at t=1.  Also, this
>> > doesn't explain the ~10x worse performance we're seeing for 2i now.
>>
>> where do you see ~10x batter performance than now with the same costs?
>
> Not the same, but I meant this:
>
> http://www.openwall.com/lists/john-dev/2015/08/14/42
>
> [a@...er run]$ ./john --test --format=argon2i-opencl --v=4
> Benchmarking: argon2i-opencl [Blake2 OpenCL]...
> memory per hash : 1.46 MB
> Device 0: Tahiti [AMD Radeon HD 7900 Series]
> Calculating best global worksize (GWS); max. 1s single kernel invocation.
> gws:       256         387 c/s         387 rounds/s 659.830ms per crypt_all()!
> gws:       512         720 c/s         720 rounds/s 710.817ms per crypt_all()+
> gws:      1024        1305 c/s        1305 rounds/s 784.470ms per crypt_all()+
> Local worksize (LWS) 64, global worksize (GWS) 1024
> using different password for benchmarking
> DONE
> Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1
> Many salts:     389 c/s real, 102400 c/s virtual
> Only one salt:  386 c/s real, 51200 c/s virtual
>
> vs. this:
>
> http://www.openwall.com/lists/john-dev/2015/08/12/11
>
> [a@...er run]$ ./john --test --format=argon2d-opencl --v=4
> Benchmarking: argon2d-opencl [Blake2 OpenCL]...
> memory per hash : 1.46 MB
> Device 0: Tahiti [AMD Radeon HD 7900 Series]
> Calculating best global worksize (GWS); max. 1s single kernel invocation.
> gws:       256         964 c/s         964 rounds/s 265.514ms per crypt_all()!
> gws:       512        1878 c/s        1878 rounds/s 272.497ms per crypt_all()+
> gws:      1024        3447 c/s        3447 rounds/s 297.022ms per crypt_all()+
> Local worksize (LWS) 64, global worksize (GWS) 1024
> using different password for benchmarking
> DONE
> Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1
> Many salts:     2925 c/s real, 307200 c/s virtual
> Only one salt:  2898 c/s real, 307200 c/s virtual
>
> It's 2i at t=3 vs. 2d at t=1.  I'd expect the former to be at most 3x
> slower (because of higher t), and in practice less than that due to 2i's
> predictable and coalescing-friendly access pattern.

I'm not sure if you fully understand this post
http://www.openwall.com/lists/john-dev/2015/08/14/42
on super, john computes gws, 1024 is the best, it prints:
Local worksize (LWS) 64, global worksize (GWS) 1024
but actually 256 is set and these coputations are for GWS=256, if you
specify GWS=1024 speed is better and really for GWS=1024
for argon2d is similar problem but it's not so bad : GWS isn't equal
to 256 but it's between 512 and 1024

this shows that there is a bug in auto tune or in my configuration (
but if there is a bug in my configuration there is also in auto tune,
even if I configured something wrong john shouldn't show that GWS=1024
when GWS=256)
but I don't have this problem on my laptop (another or it's just only
that first call of crypt_all() is just slower)
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.