Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 14 Aug 2015 16:37:06 +0200
From: Agnieszka Bielec <>
Subject: Re: PHC: Argon2 on GPU

2015-08-14 15:31 GMT+02:00 Solar Designer <>:
> On Thu, Aug 13, 2015 at 12:28:57AM +0200, magnum wrote:
>> On 2015-08-12 23:51, Solar Designer wrote:
>> >magnum, do you have an explanation why the best benchmark result during
>> >auto-tuning is usually substantially different from the final benchmark
>> >in most of Agnieszka's formats?  I'm fine with eventually dismissing it
>> >as "hard to achieve" and "cosmetic anyway", but I'd like to understand
>> >the cause first.  Thanks!
>> Generally a mismatch could be caused by using different [cost] test
>> vectors in auto-tune than the ones benchmarked, or auto-tune using just
>> one repeated plaintext in a format where length matters for speed (eg.
>> RAR), or something along those lines.
>> Another reason would be incorrect setup of autotune for split kernels.
>> For example, if auto-tune thinks we're going to call a split kernel 500
>> times but the real run does it 1000 times, we'll see inflated figures
>> from autotune.
>> A third reason (seen in early WPA-PSK) is when crypt_all() does
>> significant post-processing on CPU where auto-tune doesn't.
> At least the first reason you listed may likely result in suboptimal
> auto-tuning.  Perhaps it wouldn't with simple iterated schemes like
> PBKDF2, but with memory-hard schemes like Argon2 the cost settings do
> affect optimal LWS and GWS substantially.
> So we shouldn't dismiss this without understanding of what exactly is
> going on in a given case.

cracking mode on my laptop on argon2d showed that at the beginning
speed is the same to this showed during computing gws, after some time
I am getting speed closest to showed during --test but it's not
exactly the same.

0g 0:00:00:05 13.67% 2/3 (ETA: 16:00:32) 0g/s 3922p/s 3922c/s 3922C/s
GPU:56°C util:99% leugim..nolfet

after 1 min
0g 0:00:03:25  3/3 0g/s 4067p/s 4067c/s 4067C/s GPU:77°C util:99% 213160..241144

after 5 min
0g 0:00:07:40  3/3 0g/s 4083p/s 4083c/s 4083C/s GPU:78°C util:45%


Local worksize (LWS) 64, global worksize (GWS) 512
using different password for benchmarking
Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1
Many salts:     4114 c/s real, 4077 c/s virtual
Only one salt:  4114 c/s real, 4114 c/s virtual

I don't have big differences with argon2i on my laptop

on super:

[ run]$ ./john --test --format=argon2i-opencl --v=4
Benchmarking: argon2i-opencl [Blake2 OpenCL]...
memory per hash : 1.46 MB
Device 0: Tahiti [AMD Radeon HD 7900 Series]
Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=138
Calculating best global worksize (GWS); max. 1s single kernel invocation.
gws:       256         385 c/s         385 rounds/s 663.846ms per crypt_all()!
gws:       512         719 c/s         719 rounds/s 711.475ms per crypt_all()+
gws:      1024        1298 c/s        1298 rounds/s 788.748ms per crypt_all()+
Local worksize (LWS) 64, global worksize (GWS) 1024
using different password for benchmarking
Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1
Many salts:     390 c/s real, 102400 c/s virtual
Only one salt:  390 c/s real, 102400 c/s virtual

cracking run shows

Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:21 6.61% 2/3 (ETA: 17:03:04) 0g/s 385.3p/s 385.3c/s
385.3C/s fireballs..bens
GPU 0 overheat (33816176°C, fan 0%), aborting job.
0g 0:00:00:21 6.61% 2/3 (ETA: 17:03:04) 0g/s 384.0p/s 384.0c/s
384.0C/s fireballs..bens

so speeds reported by main --test are good

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.