Date: Thu, 04 Jun 2015 12:07:23 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: PHC: Parallel in OpenCL On 2015-06-04 00:45, Lukas Odzioba wrote: > Agnieszka tried to implement optimization that exploits presence of 0 > bytes in the sha512 input, which happens in "parallel loop". > We can't make such assumptions for all sha512 calls used in function > parallel, so implementing slightly different SHA512 with this > optimizations (and still we had to have the normal version) increased > code size, which what we think reduced performance because code size > exceeded L1 code cache on GCN, actual performance after this change > dropped from 45k to 28k c/s. > She also implemented splitted kernel and it itself also degradated > performance (from 28k to 27k c/s). Isn't the loop kernel using the "zeros" sha function (only)? And the other kernels use the full version? Then I can't see how code size would be larger for a given kernel. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.