Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 5 Jun 2012 13:34:24 +0530
Subject: Re: HD 7970 LDS Bank Conflicts

Hi Milen,

On Tue, Jun 5, 2012 at 1:18 PM, Milen Rangelov <> wrote:

> This is a good candidate for local memory, I would not leave it like that.
> Arrays declared as local variables are almost always a bad idea, even if
> they are backed by private memory, but in reality just very small arrays
> do, it is more likely that the scratchpad memory is used instead which is
> as slow as global. Also I am not sure if your calculations regarding
> avoiding channel conflicts that involve division by 13 do not end up slower
> than actually having the LDS conflicts because integer division and
> especially modulus are very expensive.

The code you saw in the current repository tries to maximze only global
memory bandwidth. It doesn't use LDS. Since 7970 has 12 global  memory
channels , it is fixed to 12+1 =13(ideally 12 should provide max
performance but it turns out 13 providing max performance) which gives
maximum performance on 7970 using Global memory.  Also this kernel has ALU
instruction to fetch ratio of only 3.29 . So I'm not worried about ALU
Thanks for the feedback.


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.