Date: Tue, 02 Jun 2015 11:40:41 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Lei's weekly report #5 On 2015-06-02 05:38, Solar Designer wrote: > On Tue, Jun 02, 2015 at 10:37:27AM +0800, Lei Zhang wrote: >>> I think more importantly than all of those you listed, you need to start >>> reviewing and profiling the generated assembly code. Right now, it is >>> unclear why there's often a slowdown when going from 1x to 2x >>> interleaving, even if in some cases there's a speedup at higher >>> interleaving factors. You need to find this out. Until you do, you're >>> unnecessarily walking blindfolded. >> >> OK. >> But what do you mean by "reviewing the assembly code"? What exactly am I supposed to investigate in the assembly? > > Compare it for no interleaving (aka 1x) and 2x interleaving, and see if > anything looks non-optimal to you. It might be e.g. registers getting > spilled to memory and loaded back, or extra (non-unrolled) loops still > seen at assembly code level. Also, note how the code size changes. Lei, before doing this please ensure you get latest commits. I found some unneccesary uses of tmp[SIMD_PARA] in SHA2 formats where just a single tmp would do. The compiler should have optimized it but let's not depend on that. On a side note I also changed the MD4/MD5/SHA1 formats to use the "wider/fewer" for loops like SHA2, allowing a similar change from tmp arrays to single tmp variables. For SHA1 I also found a whole array "vtype tmpR[16*SIMD_PARA]" that wasn't needed. 2-3% boost for those formats. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.