Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 13 Aug 2015 09:52:34 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Argon2 on GPU

2015-08-13 0:28 GMT+02:00 magnum <john.magnum@...hmail.com>:
> On 2015-08-12 23:51, Solar Designer wrote:
>>
>> On Wed, Aug 12, 2015 at 11:45:35PM +0200, magnum wrote:
>>>
>>> On 2015-08-12 18:32, Agnieszka Bielec wrote:
>>>>
>>>> gws:      1024        3447 c/s        3447 rounds/s 297.022ms per
>>>> crypt_all()+
>>>> Local worksize (LWS) 64, global worksize (GWS) 1024
>>>> using different password for benchmarking
>>>> DONE
>>>> Speed for cost 1 (t) of 1, cost 2 (m) of 1500, cost 3 (l) of 1
>>>> Many salts:     2925 c/s real, 307200 c/s virtual
>>>> Only one salt:  2898 c/s real, 307200 c/s virtual
>>>
>>>
>>> The benchmark figures (last two lines) are the correct ones. If you set
>>> up auto-tune correctly, that speed should be similar to the benchmark.
>>> For some formats/situations this is hard to achieve and it's just
>>> cosmetic anyway.
>>
>>
>> magnum, do you have an explanation why the best benchmark result during
>> auto-tuning is usually substantially different from the final benchmark
>> in most of Agnieszka's formats?  I'm fine with eventually dismissing it
>> as "hard to achieve" and "cosmetic anyway", but I'd like to understand
>> the cause first.  Thanks!
>
>
> Generally a mismatch could be caused by using different [cost] test vectors
> in auto-tune than the ones benchmarked, or auto-tune using just one repeated
> plaintext in a format where length matters for speed (eg. RAR), or something
> along those lines.
>
> Another reason would be incorrect setup of autotune for split kernels. For
> example, if auto-tune thinks we're going to call a split kernel 500 times
> but the real run does it 1000 times, we'll see inflated figures from
> autotune.
>
> A third reason (seen in early WPA-PSK) is when crypt_all() does significant
> post-processing on CPU where auto-tune doesn't.

none of these I printfed plaintexts which are set during computation
of gws and modified benchc.c to set the same values and result is the
same

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ