Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 9 Sep 2015 03:00:01 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: md5crypt mmxput*()

On 2015-09-08 16:18, Solar Designer wrote:
> On Tue, Sep 08, 2015 at 01:17:14PM +0300, Solar Designer wrote:
>> Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
>> Raw:    231424 c/s real, 28928 c/s virtual
>
>> I think further speedup is possible by using a switch statement to make
>> the shift counts into constants (we have an if anyway, we'll just
>> replace it with a switch) like cryptmd5_kernel.cl has.
>
> I cleaned up the code and implemented switch - patch attached.
> It turned out to cause a minor performance regression on bull (due to
> code size growth maybe?) so I am disabling it for XOP and keep the
> performance almost the same as above:
>
> Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
> Raw:    231680 c/s real, 28923 c/s virtual

Code size, eh? This reminded me there is a "#pragma GCC optimize 3" in 
that file that I always found slightly dubious. We should verify how 
each format reacts to dropping that.

Quick test for now, on bull; Enabled the switch for XOP, dropped that 
pragma:

Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
Raw:    233472 c/s real, 29184 c/s virtual

The total file size actually increased but that might be other parts 
getting larger (though I'm not sure why anything would become larger?).

$ ls -lrt simd-intrinsics*o
-rw-rw-r-- 1 magnum magnum 97968 Sep  9 02:34 simd-intrinsics-bleeding.o
-rw-rw-r-- 1 magnum magnum 98096 Sep  9 02:35 simd-intrinsics-switch.o
-rw-rw-r-- 1 magnum magnum 98320 Sep  9 02:43 simd-intrinsics.o

First is current bleeding, middle is with -O3 and switch (larger), last 
one is with -O2 and switch (even larger).

Then again, how about -O2 and *not* using switch?

Benchmarking: md5crypt, crypt(3) $1$ [MD5 128/128 XOP 4x2]... (8xOMP) DONE
Raw:    232960 c/s real, 29120 c/s virtual

OK, that's inbetween. So before having tested any other format or arch, 
the pragma should go and the switch should be use for XOP too.

Apparently Jim added that pragma in 2013 while (I think) adding SHA-2, 
likely because Gosney's original code had it. I will do some testing!

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ