Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 4 Jun 2015 00:45:08 +0200
From: Lukas Odzioba <>
Subject: Re: PHC: Parallel in OpenCL

2015-06-04 0:30 GMT+02:00 Solar Designer <>:
> I am somewhat out of context on your discussion with Agnieszka, so I am
> puzzled by the comments (initially by her, and now also by you) of code
> size increases somehow being associated with use of split kernels.

Sorry, but I also was confused with some of our discussion :)

Agnieszka tried to implement optimization that exploits presence of 0
bytes in the sha512 input, which happens in "parallel loop".
We can't make such assumptions for all sha512 calls used in function
parallel, so implementing slightly different SHA512 with this
optimizations (and still we had to have the normal version) increased
code size, which what we think reduced performance because code size
exceeded L1 code cache on GCN, actual performance after this change
dropped from 45k to 28k c/s.
She also implemented splitted kernel and it itself also degradated
performance (from 28k to 27k c/s).

Unfortunatelly I forgot that SHA2 are somewhat resistant to such
optimizations because it just removes a handful of additions and now I
think that we might want to ommit this optimization especially while
we are having problems with code size.
There are still some low hanging fruits in the code which should
increase performance more than what we were trying to do.

I hope that clears things up,

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.