Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 13 Aug 2015 11:23:05 +0200
From: magnum <>
Subject: Autotune speed figures (was: Re: PHC: Argon2 on GPU)

On 2015-08-13 09:52, Agnieszka Bielec wrote:
> 2015-08-13 0:28 GMT+02:00 magnum <>:
>> On 2015-08-12 23:51, Solar Designer wrote:
>>> magnum, do you have an explanation why the best benchmark result during
>>> auto-tuning is usually substantially different from the final benchmark
>>> in most of Agnieszka's formats?  I'm fine with eventually dismissing it
>>> as "hard to achieve" and "cosmetic anyway", but I'd like to understand
>>> the cause first.  Thanks!
>> Generally a mismatch could be caused by using different [cost] test vectors
>> in auto-tune than the ones benchmarked, or auto-tune using just one repeated
>> plaintext in a format where length matters for speed (eg. RAR), or something
>> along those lines.
>> Another reason would be incorrect setup of autotune for split kernels. For
>> example, if auto-tune thinks we're going to call a split kernel 500 times
>> but the real run does it 1000 times, we'll see inflated figures from
>> autotune.
>> A third reason (seen in early WPA-PSK) is when crypt_all() does significant
>> post-processing on CPU where auto-tune doesn't.
> none of these I printfed plaintexts which are set during computation
> of gws and modified benchc.c to set the same values and result is the
> same

Then you might want to dig into it. The autotune code should be easy to 
follow. Try to establish exactly what it comes up with and how it ends 
up with the figures it prints for your format.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.