Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 6 Jun 2012 06:36:19 -0700
From: Alain Espinosa <>
Subject: Re: HD 7970 LDS Bank Conflicts

On 6/5/12, Milen Rangelov <> wrote:
> Sanyatan, as they have replied in AMD forum, worksize of 8 means you are
> going to significantly underutilize the GPU (sh claims 1/32, but I think
> wavefront is 64-wide thus it would be 1/8). This is far worse than
> penalties from LDS bank conflicts.

worksize needs to be a multiple of 64 in AMD GPU and 32 in Nvidia GPU
or we incurr in big penalties.

> ... Also I am not sure if your calculations regarding
> avoiding channel conflicts that involve division by 13 do not end up slower
> than actually having the LDS conflicts because integer division and
> especially modulus are very expensive.

Division and modulus are very expensive. I change to use AND and shift
and i get 25% increase only by that (changing the algorithm).

> If I were to optimize the opencl BF code I would first put what's
> appropriate in local memory (obviously the 4KB state can't fit)...

Why, 4KB is still under the limit for local memory if enough big
worksize is used.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.