Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 14 Sep 2012 19:21:27 +0200
From: magnum <>
Subject: Re: intrinsics: speed up for linux-x86-64-native

On 14 Sep, 2012, at 15:08 , Aleksey Cherepanov <> wrote:

> Looking over sse-intrinsics.c I noticed weird thing: multiple
> MD5_PARA_DO cycles when it is possible to write one cycle over
> everything and avoid use of tmp variable. I tried to avoid some cycles
> and got a speed up. But when I merged them into one cycle per MD5_STEP
> I got a significant slowdown.

It's not very intuitive but AFAIK it was made that way on purpose: The intention is to get code that hides latency, much like GPU coding. This is pretty compiler-dependant and that is why icc can make such a great difference.

That is about all I know so I'll leave the details to the experts. I hope someone can give a more thorough explanation - I'd read it with interest.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ