Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 04 Jun 2015 12:07:23 +0200
From: magnum <>
Subject: Re: PHC: Parallel in OpenCL

On 2015-06-04 00:45, Lukas Odzioba wrote:
> Agnieszka tried to implement optimization that exploits presence of 0
> bytes in the sha512 input, which happens in "parallel loop".
> We can't make such assumptions for all sha512 calls used in function
> parallel, so implementing slightly different SHA512 with this
> optimizations (and still we had to have the normal version) increased
> code size, which what we think reduced performance because code size
> exceeded L1 code cache on GCN, actual performance after this change
> dropped from 45k to 28k c/s.
> She also implemented splitted kernel and it itself also degradated
> performance (from 28k to 27k c/s).

Isn't the loop kernel using the "zeros" sha function (only)? And the 
other kernels use the full version? Then I can't see how code size would 
be larger for a given kernel.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.