john-dev - Re: PHC: Parallel in OpenCL

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CAKGDhHXxhEn3CfvBshkw9E_+egT3j3b=KBhPDL6xCwPCH4BXRg@mail.gmail.com>
Date: Thu, 4 Jun 2015 16:14:38 +0200
From: Agnieszka Bielec <bielecagnieszka8@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: PHC: Parallel in OpenCL

2015-06-04 13:38 GMT+02:00 magnum <john.magnum@...hmail.com>:
> On 2015-06-04 12:07, magnum wrote:
>>
>> On 2015-06-04 00:45, Lukas Odzioba wrote:
>>> She also implemented splitted kernel and it itself also degradated
>>> performance (from 28k to 27k c/s).

this isn't true. http://www.openwall.com/lists/john-dev/2015/06/04/1

>>
>> Isn't the loop kernel using the "zeros" sha function (only)? And the
>> other kernels use the full version? Then I can't see how code size would
>> be larger for a given kernel.

yes, but for amd gcn I have only demo

> looks to me each call is 3*5*128 rounds of SHA512?
yes, 3*5*128 for each call parallel_kernel_loop()

> Note these lines (after my patch):
>
>        opencl_init_auto_setup(SEED, 3*5*128*1, split_events,
>             warn, 4, self, create_clobj, release_clobj, BINARY_SIZE*3, 0);
>
>        autotune_run(self, 3*5*128*1, 0, 1000);
>
> If you change the loop kernel to do only 128 rounds per call, you should
> change it accordingly for opencl_init_auto_setup() but not for
> autotune_run(). The latter is total, the former is how much you do per call.
> If you change to a test vector with another cost, change the *1 accordingly
> for both.

is this necessary? I don't see any difference in performance and if I
want to change *1, code will be complicated like in pomelo

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.