Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 02 Jun 2015 11:40:41 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Lei's weekly report #5

On 2015-06-02 05:38, Solar Designer wrote:
> On Tue, Jun 02, 2015 at 10:37:27AM +0800, Lei Zhang wrote:
>>> I think more importantly than all of those you listed, you need to start
>>> reviewing and profiling the generated assembly code.  Right now, it is
>>> unclear why there's often a slowdown when going from 1x to 2x
>>> interleaving, even if in some cases there's a speedup at higher
>>> interleaving factors.  You need to find this out.  Until you do, you're
>>> unnecessarily walking blindfolded.
>>
>> OK.
>> But what do you mean by "reviewing the assembly code"? What exactly am I supposed to investigate in the assembly?
>
> Compare it for no interleaving (aka 1x) and 2x interleaving, and see if
> anything looks non-optimal to you.  It might be e.g. registers getting
> spilled to memory and loaded back, or extra (non-unrolled) loops still
> seen at assembly code level.  Also, note how the code size changes.

Lei, before doing this please ensure you get latest commits. I found 
some unneccesary uses of tmp[SIMD_PARA] in SHA2 formats where just a 
single tmp would do. The compiler should have optimized it but let's not 
depend on that.

On a side note I also changed the MD4/MD5/SHA1 formats to use the 
"wider/fewer" for loops like SHA2, allowing a similar change from tmp 
arrays to single tmp variables. For SHA1 I also found a whole array 
"vtype tmpR[16*SIMD_PARA]" that wasn't needed. 2-3% boost for those formats.

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.