Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 14 Aug 2015 16:31:33 +0300
From: Solar Designer <>
Subject: Re: PHC: Argon2 on GPU

On Thu, Aug 13, 2015 at 12:28:57AM +0200, magnum wrote:
> On 2015-08-12 23:51, Solar Designer wrote:
> >magnum, do you have an explanation why the best benchmark result during
> >auto-tuning is usually substantially different from the final benchmark
> >in most of Agnieszka's formats?  I'm fine with eventually dismissing it
> >as "hard to achieve" and "cosmetic anyway", but I'd like to understand
> >the cause first.  Thanks!
> Generally a mismatch could be caused by using different [cost] test 
> vectors in auto-tune than the ones benchmarked, or auto-tune using just 
> one repeated plaintext in a format where length matters for speed (eg. 
> RAR), or something along those lines.
> Another reason would be incorrect setup of autotune for split kernels. 
> For example, if auto-tune thinks we're going to call a split kernel 500 
> times but the real run does it 1000 times, we'll see inflated figures 
> from autotune.
> A third reason (seen in early WPA-PSK) is when crypt_all() does 
> significant post-processing on CPU where auto-tune doesn't.

At least the first reason you listed may likely result in suboptimal
auto-tuning.  Perhaps it wouldn't with simple iterated schemes like
PBKDF2, but with memory-hard schemes like Argon2 the cost settings do
affect optimal LWS and GWS substantially.

So we shouldn't dismiss this without understanding of what exactly is
going on in a given case.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.