john-dev - Re: PHC: Parallel in OpenCL

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <847cc72c0aa0c402c2399d010e00d772@smtp.hushmail.com>
Date: Thu, 04 Jun 2015 16:39:08 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Parallel in OpenCL

On 2015-06-04 16:14, Agnieszka Bielec wrote:
> 2015-06-04 13:38 GMT+02:00 magnum <john.magnum@...hmail.com>:
>> looks to me each call is 3*5*128 rounds of SHA512?
> yes, 3*5*128 for each call parallel_kernel_loop()
>
>> Note these lines (after my patch):
>>
>>         opencl_init_auto_setup(SEED, 3*5*128*1, split_events,
>>              warn, 4, self, create_clobj, release_clobj, BINARY_SIZE*3, 0);
>>
>>         autotune_run(self, 3*5*128*1, 0, 1000);
>>
>> If you change the loop kernel to do only 128 rounds per call, you should
>> change it accordingly for opencl_init_auto_setup() but not for
>> autotune_run(). The latter is total, the former is how much you do per call.
>> If you change to a test vector with another cost, change the *1 accordingly
>> for both.
>
> is this necessary? I don't see any difference in performance and if I
> want to change *1, code will be complicated like in pomelo

The *1 is (cost of) test vector #0 so you'd only need to change it if 
you change the test vectors. But it'd mostly affect showing correct 
output with --verb=5.

The rest is necessary because auto-tune need to know what it tunes. If 
you are in fact calling your loop kernel 15 times in crypt_all() but 
only once in crypt_all_benchmark() (this is not the case yet though), 
you need to inform auto-tune about it. The result would be 15 times 
faster auto-tune. That's how eg. wpapsk-opencl can auto-tune so quick 
with fair accuracy despite being a slow format.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.