Date: Tue, 02 Jun 2015 17:30:41 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: Interleaving of intrinsics On 2015-06-02 16:57, magnum wrote: > On 2015-06-02 13:01, Solar Designer wrote: >> magnum - >> >> On Mon, Jun 01, 2015 at 01:37:14PM +0200, magnum wrote: >>> Or perhaps as soon as we use interleaving, things like tmp[SIMD_PARA] >>> end up being stack arrays? That should hurt a lot. >> >> This is quite possible. In general, one of the things limiting the >> interleaving factor is register pressure - and the compiler might in >> fact do a worse job at register allocation when we use arrays. >> >>> Actually, here's a bug we have: Using the wide loops as in SHA2, we >>> don't need to use "tmp[i]" at all - we do fine with just "tmp". >> >> Huh? Doesn't this defeat interleaving, replacing it with sequential >> processing, because our source code sort of hints to the compiler to >> reuse the same register across instances? Or are we hoping that the >> compiler or the CPU will recognize that we're reusing the variable, and >> actually allocate a new register or a new rename register, respectively? >> The compiler might and a CPU capable of register renaming at all >> probably will, but didn't we intend to reduce rather then increase our >> reliance on luck? >> >> I just took a look at commit cde0fb470f35ef6dc5949d3b11137dd27ca2672b, >> and it does look as problematic as I had thought from reading your >> message. :-( > > I see what you mean and maybe we never got proper interleaving anyway. > But MD4 and MD5 are faster at the same x3 as before. Anyway reverting to > use tmp arrays again is easy. I reverted just the use of tmp arrays in d85f8fd and the speedups I had are still there. I guess the actual cause of it was just the fewer/wider loops that are still in there now. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.