|
Date: Wed, 6 Jun 2012 06:36:19 -0700 From: Alain Espinosa <alainesp@...il.com> To: john-dev@...ts.openwall.com Subject: Re: HD 7970 LDS Bank Conflicts On 6/5/12, Milen Rangelov <gat3way@...il.com> wrote: > Sanyatan, as they have replied in AMD forum, worksize of 8 means you are > going to significantly underutilize the GPU (sh claims 1/32, but I think > wavefront is 64-wide thus it would be 1/8). This is far worse than > penalties from LDS bank conflicts. worksize needs to be a multiple of 64 in AMD GPU and 32 in Nvidia GPU or we incurr in big penalties. > ... Also I am not sure if your calculations regarding > avoiding channel conflicts that involve division by 13 do not end up slower > than actually having the LDS conflicts because integer division and > especially modulus are very expensive. Division and modulus are very expensive. I change to use AND and shift and i get 25% increase only by that (changing the algorithm). > If I were to optimize the opencl BF code I would first put what's > appropriate in local memory (obviously the 4KB state can't fit)... Why, 4KB is still under the limit for local memory if enough big worksize is used. saludos, alain
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.