Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 16 Sep 2013 10:53:15 -0700
From: Alain Espinosa <alainesp@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: GTX TITAN (was: new dev box wishes)

On 9/15/13, Milen Rangelov <gat3way@...il.com> wrote:
> I have almost never seen any benefits from vectorizing on 7970 (well expect
> one or two cases).
>
> There are occasions where the same code, vectorized, gives better
> performance on 7970, but then if you can transform the scalar code so that
> more work is being done in the kernel (e.g by doing 4 consecutive
> operations in a loop as compared to using 4x vectors) you'd eventually come
> up with a faster, scalar solution given the same ammount of work (global
> work size / vector size). In case global work size is the same, the
> vectorized solution may seem faster just because overall you have more
> kernel launches per second with the scalar code as compared to vector code
> and kernel launch latency and host-device transfers then come into play. I
> think AMD APP profiler can be very helpful to figure out what's happening
> in such cases.

This is not the case with my test (almost all you write). For
clearance i give the general kernel before:

__kernel void ntlm_crack(...)
{
   // Initialization code
     ...

   for(uint i=0;i<100;i++)
   {
      uint a,b,c,d;
      // NTLM code
        ...
   }
}

and after:

__kernel void ntlm_crack(...)
{
   // Initialization code
     ...

   for(uint i=0;i<100;i+=3)
   {
      uint3 a,b,c,d;
      // NTLM code
        ...
   }
}

This give the 15-20% performance improvement in a HD 7970 (and in a GT
630) (this is in my particular case, but i think it give in other case
too). You note that all parameters are the same for the two solutions,
and yes, i make a waste calculation in the second solution at end of
cycle.

saludos,
alain

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ