Date: Thu, 4 Jun 2015 16:14:38 +0200 From: Agnieszka Bielec <bielecagnieszka8@...il.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Parallel in OpenCL 2015-06-04 13:38 GMT+02:00 magnum <john.magnum@...hmail.com>: > On 2015-06-04 12:07, magnum wrote: >> >> On 2015-06-04 00:45, Lukas Odzioba wrote: >>> She also implemented splitted kernel and it itself also degradated >>> performance (from 28k to 27k c/s). this isn't true. http://www.openwall.com/lists/john-dev/2015/06/04/1 >> >> Isn't the loop kernel using the "zeros" sha function (only)? And the >> other kernels use the full version? Then I can't see how code size would >> be larger for a given kernel. yes, but for amd gcn I have only demo > looks to me each call is 3*5*128 rounds of SHA512? yes, 3*5*128 for each call parallel_kernel_loop() > Note these lines (after my patch): > > opencl_init_auto_setup(SEED, 3*5*128*1, split_events, > warn, 4, self, create_clobj, release_clobj, BINARY_SIZE*3, 0); > > autotune_run(self, 3*5*128*1, 0, 1000); > > If you change the loop kernel to do only 128 rounds per call, you should > change it accordingly for opencl_init_auto_setup() but not for > autotune_run(). The latter is total, the former is how much you do per call. > If you change to a test vector with another cost, change the *1 accordingly > for both. is this necessary? I don't see any difference in performance and if I want to change *1, code will be complicated like in pomelo
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.