john-dev - Re: intrinsics: speed up for linux-x86-64-native

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <75bd5da8a0ed0ad50e047745f89447ba@smtp.hushmail.com>
Date: Fri, 14 Sep 2012 19:21:27 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: intrinsics: speed up for linux-x86-64-native

On 14 Sep, 2012, at 15:08 , Aleksey Cherepanov <aleksey.4erepanov@...il.com> wrote:

> Looking over sse-intrinsics.c I noticed weird thing: multiple
> MD5_PARA_DO cycles when it is possible to write one cycle over
> everything and avoid use of tmp variable. I tried to avoid some cycles
> and got a speed up. But when I merged them into one cycle per MD5_STEP
> I got a significant slowdown.

It's not very intuitive but AFAIK it was made that way on purpose: The intention is to get code that hides latency, much like GPU coding. This is pretty compiler-dependant and that is why icc can make such a great difference.

That is about all I know so I'll leave the details to the experts. I hope someone can give a more thorough explanation - I'd read it with interest.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.