Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 4 Jun 2015 16:14:38 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Parallel in OpenCL

2015-06-04 13:38 GMT+02:00 magnum <john.magnum@...hmail.com>:
> On 2015-06-04 12:07, magnum wrote:
>>
>> On 2015-06-04 00:45, Lukas Odzioba wrote:
>>> She also implemented splitted kernel and it itself also degradated
>>> performance (from 28k to 27k c/s).

this isn't true. http://www.openwall.com/lists/john-dev/2015/06/04/1

>>
>> Isn't the loop kernel using the "zeros" sha function (only)? And the
>> other kernels use the full version? Then I can't see how code size would
>> be larger for a given kernel.

yes, but for amd gcn I have only demo

> looks to me each call is 3*5*128 rounds of SHA512?
yes, 3*5*128 for each call parallel_kernel_loop()

> Note these lines (after my patch):
>
>        opencl_init_auto_setup(SEED, 3*5*128*1, split_events,
>             warn, 4, self, create_clobj, release_clobj, BINARY_SIZE*3, 0);
>
>        autotune_run(self, 3*5*128*1, 0, 1000);
>
> If you change the loop kernel to do only 128 rounds per call, you should
> change it accordingly for opencl_init_auto_setup() but not for
> autotune_run(). The latter is total, the former is how much you do per call.
> If you change to a test vector with another cost, change the *1 accordingly
> for both.

is this necessary? I don't see any difference in performance and if I
want to change *1, code will be complicated like in pomelo

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ