Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 4 Jun 2015 16:14:38 +0200
From: Agnieszka Bielec <>
Subject: Re: PHC: Parallel in OpenCL

2015-06-04 13:38 GMT+02:00 magnum <>:
> On 2015-06-04 12:07, magnum wrote:
>> On 2015-06-04 00:45, Lukas Odzioba wrote:
>>> She also implemented splitted kernel and it itself also degradated
>>> performance (from 28k to 27k c/s).

this isn't true.

>> Isn't the loop kernel using the "zeros" sha function (only)? And the
>> other kernels use the full version? Then I can't see how code size would
>> be larger for a given kernel.

yes, but for amd gcn I have only demo

> looks to me each call is 3*5*128 rounds of SHA512?
yes, 3*5*128 for each call parallel_kernel_loop()

> Note these lines (after my patch):
>        opencl_init_auto_setup(SEED, 3*5*128*1, split_events,
>             warn, 4, self, create_clobj, release_clobj, BINARY_SIZE*3, 0);
>        autotune_run(self, 3*5*128*1, 0, 1000);
> If you change the loop kernel to do only 128 rounds per call, you should
> change it accordingly for opencl_init_auto_setup() but not for
> autotune_run(). The latter is total, the former is how much you do per call.
> If you change to a test vector with another cost, change the *1 accordingly
> for both.

is this necessary? I don't see any difference in performance and if I
want to change *1, code will be complicated like in pomelo

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.