Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 15 Sep 2012 00:43:37 +0200
From: magnum <>
Subject: Re: intrinsics: speed up for linux-x86-64-native

On 14 Sep, 2012, at 15:08 , Aleksey Cherepanov <> wrote:

> Looking over sse-intrinsics.c I noticed weird thing: multiple
> MD5_PARA_DO cycles when it is possible to write one cycle over
> everything and avoid use of tmp variable. I tried to avoid some cycles
> and got a speed up. But when I merged them into one cycle per MD5_STEP
> I got a significant slowdown.

By the way, you need to learn to trust the compiler (it took me a great while for sure). Those for loops are simply not there if you look at the generated assembly code. They were unrolled. I must confess I too have tried replacing all those for loops with one that enclosed the whole shebang. But that just means the code gets complicated enough to (likely) suppress unrolling (besides it defeating the purpose we already discussed). I have spent a lot of time rewriting code that just got worse because the truly MAGIC compiler optimizations was confused by my deeds. For our needs, it is healthy not to trust the compiler *too* much, but at the same time you need to learn how much you actually *can* trust it. I love this subject.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ