john-dev - Re: nVidia Maxwell support

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <01d80a530e367a0a4612b90960c52a69@smtp.hushmail.com>
Date: Tue, 21 Oct 2014 17:26:00 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: nVidia Maxwell support

On 2014-09-24 00:00, magnum wrote:
> Is it safe to assume that OpenCL's bitselect() function will boil down
> to the new instructions on Maxwell? Or is it not that simple?
>
> For example, we have many cases like this:
>
> #ifdef USE_BITSELECT
> #define F(x, y, z)    bitselect((z), (y), (x))
> #define G(x, y, z)    bitselect((y), (x), (z))
> #else
> #define F(x, y, z)    ((z) ^ ((x) & ((y) ^ (z))))
> #define G(x, y, z)    ((y) ^ ((z) & ((x) ^ (y))))
> #endif

I had a chance to try out a GTX980. Like I suspected, it seems to 
unpredictably affect "optimizer transparency" so using bitselect for the 
above gains a few percent with some versions of SHA1 and lose a few with 
others.

> The thing is, we currently only define USE_BITSELECT for AMD devices.
> Would it be safer, in the nvidia case, to leave the non-bitselect
> versions for the optimizer to consider? Or would it be safer to use
> bitselect, or should it really not matter? It seems to still matter on AMD.
>
> If use of bitselect() increases the chance for better low-level code for
> nvidia too, maybe we should always define USE_BITSELECT (I'd still keep
> the #ifdefs for quick benchmarks with/without them, as well as for
> reference).

I only tested WPAPSK for now because it's very flexible, so I could try 
many combinations of alternate code just messing around with #ifdefs. I 
ended up using the same SHA1 we use for AMD (Milen's, not Lukas' code 
with a separate "SHA1SHORT" hard-coded for short input) and enabling 
bitselect, for a ~5% gain (exceeding 3 billion SHA1/s). Using bitselect 
gained some with Milen's code and ruined some with Lukas' code.

Another curious observation was that compared to the default scalar 
mode, Lukas' code gained a little speed with --force-vector=2 (ie. using 
uint2) while Milen's code lost some. Any larger vector size would ruin 
performance, as expected (perhaps more than expected).

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.