Date: Fri, 14 Aug 2015 16:37:06 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU 2015-08-14 15:31 GMT+02:00 Solar Designer <solar@...nwall.com>: > On Thu, Aug 13, 2015 at 12:28:57AM +0200, magnum wrote: >> On 2015-08-12 23:51, Solar Designer wrote: >> >magnum, do you have an explanation why the best benchmark result during >> >auto-tuning is usually substantially different from the final benchmark >> >in most of Agnieszka's formats? I'm fine with eventually dismissing it >> >as "hard to achieve" and "cosmetic anyway", but I'd like to understand >> >the cause first. Thanks! >> >> Generally a mismatch could be caused by using different [cost] test >> vectors in auto-tune than the ones benchmarked, or auto-tune using just >> one repeated plaintext in a format where length matters for speed (eg. >> RAR), or something along those lines. >> >> Another reason would be incorrect setup of autotune for split kernels. >> For example, if auto-tune thinks we're going to call a split kernel 500 >> times but the real run does it 1000 times, we'll see inflated figures >> from autotune. >> >> A third reason (seen in early WPA-PSK) is when crypt_all() does >> significant post-processing on CPU where auto-tune doesn't. > > At least the first reason you listed may likely result in suboptimal > auto-tuning. Perhaps it wouldn't with simple iterated schemes like > PBKDF2, but with memory-hard schemes like Argon2 the cost settings do > affect optimal LWS and GWS substantially. > > So we shouldn't dismiss this without understanding of what exactly is > going on in a given case. cracking mode on my laptop on argon2d showed that at the beginning speed is the same to this showed during computing gws, after some time I am getting speed closest to showed during --test but it's not exactly the same. beggining 0g 0:00:00:05 13.67% 2/3 (ETA: 16:00:32) 0g/s 3922p/s 3922c/s 3922C/s GPU:56°C util:99% leugim..nolfet after 1 min 0g 0:00:03:25 3/3 0g/s 4067p/s 4067c/s 4067C/s GPU:77°C util:99% 213160..241144 after 5 min 0g 0:00:07:40 3/3 0g/s 4083p/s 4083c/s 4083C/s GPU:78°C util:45% critas01..crachera --test Local worksize (LWS) 64, global worksize (GWS) 512 using different password for benchmarking DONE Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1 Many salts: 4114 c/s real, 4077 c/s virtual Only one salt: 4114 c/s real, 4114 c/s virtual I don't have big differences with argon2i on my laptop on super: [a@...er run]$ ./john --test --format=argon2i-opencl --v=4 Benchmarking: argon2i-opencl [Blake2 OpenCL]... memory per hash : 1.46 MB Device 0: Tahiti [AMD Radeon HD 7900 Series] Options used: -I ./kernels -cl-mad-enable -D__GPU__ -DDEVICE_INFO=138 -DDEV_VER_MAJOR=1800 -DDEV_VER_MINOR=5 -D_OPENCL_COMPILER -DBINARY_SIZE=256 -DSALT_SIZE=64 -DPLAINTEXT_LENGTH=32 Calculating best global worksize (GWS); max. 1s single kernel invocation. gws: 256 385 c/s 385 rounds/s 663.846ms per crypt_all()! gws: 512 719 c/s 719 rounds/s 711.475ms per crypt_all()+ gws: 1024 1298 c/s 1298 rounds/s 788.748ms per crypt_all()+ Local worksize (LWS) 64, global worksize (GWS) 1024 using different password for benchmarking DONE Speed for cost 1 (t) of 3, cost 2 (m) of 1500, cost 3 (l) of 1 Many salts: 390 c/s real, 102400 c/s virtual Only one salt: 390 c/s real, 102400 c/s virtual cracking run shows Press 'q' or Ctrl-C to abort, almost any other key for status 0g 0:00:00:21 6.61% 2/3 (ETA: 17:03:04) 0g/s 385.3p/s 385.3c/s 385.3C/s fireballs..bens GPU 0 overheat (33816176°C, fan 0%), aborting job. 0g 0:00:00:21 6.61% 2/3 (ETA: 17:03:04) 0g/s 384.0p/s 384.0c/s 384.0C/s fireballs..bens so speeds reported by main --test are good
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.