Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 7 Oct 2015 01:47:06 +0200
From: magnum <>
Subject: Re: LOP3.LUT

On 2015-10-06 02:32, magnum wrote:
> I implemented a shared lop3_lut(a, b, c, imm) function in de6c7c6 but
> it's not enabled anywhere yet: I only tested md5crypt so far and it got
> about 5% performance loss. I also tried only using it for one function
> at a time but any of them results in performance loss - even F and G
> which are both pure bitselects otherwise. I was expecting no difference
> at all, at worst.

Here's a PTX diff with *only* F changed from bitselect() to inline asm 
(I replaced all register numbers to <num> for simpler diff):

@@ -190,142 +190,130 @@
         add.s32         %r<num>, %r<num>, -117830708;
         shf.l.wrap.b32  %r<num>, %r<num>, %r<num>, 12;
         add.s32         %r<num>, %r<num>, %r<num>;
-       and.b32         %r<num>, %r<num>, %r<num>;
-       not.b32         %r<num>, %r<num>;
-       and.b32         %r<num>, %r<num>, -271733879;
-       or.b32          %r<num>, %r<num>, %r<num>;
+       mov.u32         %r<num>, -271733879;
+       // inline asm
+       lop3.b32 %r<num>, %r<num>, %r<num>, %r<num>, 228;
+       // inline asm
         ld.local.u32    %r<num>, [%rd4+72];
         add.s32         %r<num>, %r<num>, %r<num>;

So if I read it right we replace "and, not, and immediate, or" with "mov 
immediate, lop3". I can't see why that would decrease speed with 1%? 
Even if the version with no inline PTX does end up as LOP3 (it should) - 
why does the explicit version get slower?

Since we don't have CUDA 7.5 installed on super I can't look at the 
resulting ISA - ptxas won't assemble this one, for some reason not even 
the version without inline lop3.lut. It does assemble some other 
kernels, and I have seen separate logic instructions in PTX end up as 
LOP3 in the ISA. But for this comparison I'll need to continue my 
digging somewhere else, later.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ