Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 21 Oct 2014 17:26:00 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: nVidia Maxwell support

On 2014-09-24 00:00, magnum wrote:
> Is it safe to assume that OpenCL's bitselect() function will boil down
> to the new instructions on Maxwell? Or is it not that simple?
>
> For example, we have many cases like this:
>
> #ifdef USE_BITSELECT
> #define F(x, y, z)    bitselect((z), (y), (x))
> #define G(x, y, z)    bitselect((y), (x), (z))
> #else
> #define F(x, y, z)    ((z) ^ ((x) & ((y) ^ (z))))
> #define G(x, y, z)    ((y) ^ ((z) & ((x) ^ (y))))
> #endif

I had a chance to try out a GTX980. Like I suspected, it seems to 
unpredictably affect "optimizer transparency" so using bitselect for the 
above gains a few percent with some versions of SHA1 and lose a few with 
others.

> The thing is, we currently only define USE_BITSELECT for AMD devices.
> Would it be safer, in the nvidia case, to leave the non-bitselect
> versions for the optimizer to consider? Or would it be safer to use
> bitselect, or should it really not matter? It seems to still matter on AMD.
>
> If use of bitselect() increases the chance for better low-level code for
> nvidia too, maybe we should always define USE_BITSELECT (I'd still keep
> the #ifdefs for quick benchmarks with/without them, as well as for
> reference).

I only tested WPAPSK for now because it's very flexible, so I could try 
many combinations of alternate code just messing around with #ifdefs. I 
ended up using the same SHA1 we use for AMD (Milen's, not Lukas' code 
with a separate "SHA1SHORT" hard-coded for short input) and enabling 
bitselect, for a ~5% gain (exceeding 3 billion SHA1/s). Using bitselect 
gained some with Milen's code and ruined some with Lukas' code.

Another curious observation was that compared to the default scalar 
mode, Lukas' code gained a little speed with --force-vector=2 (ie. using 
uint2) while Milen's code lost some. Any larger vector size would ruin 
performance, as expected (perhaps more than expected).

magnum


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ