Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 6 Jun 2012 06:36:19 -0700
From: Alain Espinosa <alainesp@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: HD 7970 LDS Bank Conflicts

On 6/5/12, Milen Rangelov <gat3way@...il.com> wrote:
> Sanyatan, as they have replied in AMD forum, worksize of 8 means you are
> going to significantly underutilize the GPU (sh claims 1/32, but I think
> wavefront is 64-wide thus it would be 1/8). This is far worse than
> penalties from LDS bank conflicts.

worksize needs to be a multiple of 64 in AMD GPU and 32 in Nvidia GPU
or we incurr in big penalties.

> ... Also I am not sure if your calculations regarding
> avoiding channel conflicts that involve division by 13 do not end up slower
> than actually having the LDS conflicts because integer division and
> especially modulus are very expensive.

Division and modulus are very expensive. I change to use AND and shift
and i get 25% increase only by that (changing the algorithm).

> If I were to optimize the opencl BF code I would first put what's
> appropriate in local memory (obviously the 4KB state can't fit)...

Why, 4KB is still under the limit for local memory if enough big
worksize is used.

saludos,
alain

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.