Date: Thu, 13 Aug 2015 00:28:57 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Argon2 on GPU On 2015-08-12 23:51, Solar Designer wrote: > On Wed, Aug 12, 2015 at 11:45:35PM +0200, magnum wrote: >> On 2015-08-12 18:32, Agnieszka Bielec wrote: >>> gws: 1024 3447 c/s 3447 rounds/s 297.022ms per >>> crypt_all()+ >>> Local worksize (LWS) 64, global worksize (GWS) 1024 >>> using different password for benchmarking >>> DONE >>> Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1 >>> Many salts: 2925 c/s real, 307200 c/s virtual >>> Only one salt: 2898 c/s real, 307200 c/s virtual >> >> The benchmark figures (last two lines) are the correct ones. If you set >> up auto-tune correctly, that speed should be similar to the benchmark. >> For some formats/situations this is hard to achieve and it's just >> cosmetic anyway. > > magnum, do you have an explanation why the best benchmark result during > auto-tuning is usually substantially different from the final benchmark > in most of Agnieszka's formats? I'm fine with eventually dismissing it > as "hard to achieve" and "cosmetic anyway", but I'd like to understand > the cause first. Thanks! Generally a mismatch could be caused by using different [cost] test vectors in auto-tune than the ones benchmarked, or auto-tune using just one repeated plaintext in a format where length matters for speed (eg. RAR), or something along those lines. Another reason would be incorrect setup of autotune for split kernels. For example, if auto-tune thinks we're going to call a split kernel 500 times but the real run does it 1000 times, we'll see inflated figures from autotune. A third reason (seen in early WPA-PSK) is when crypt_all() does significant post-processing on CPU where auto-tune doesn't. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.