Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 12 Sep 2013 07:01:46 +0530
From: Sayantan Datta <>
Subject: Re: PUTCHAR() macro

Hi magnum,

On Thu, Sep 12, 2013 at 2:14 AM, magnum <> wrote:

> This means we are using byte-addressed stores if allowed, instead of the
> bit flogging macro. This alone made for a 33% boost for RAR with recent AMD
> drivers and it benefits nvidia with any driver I know of. However, when
> testing it with your mask-mode raw-md5 it *halves* the speed for AMD (using
> 13.4 driver on Bull). It might be that the 32-bit macro is better for
> global memory or something like that. Good to know for future tweaking.

When byte-addressable store is used most likely some extra vgprs are
required. For raw-md5-opencl vgpr count already hovers around 125 with
catalyst 13.4. Since the speed gets halved I'm guessing you are exceeding
128 vgprs threshold which halves the number of in-flight wavefronts. It may
not have anything to do with global memory access.


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.