Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 12 Sep 2013 08:57:05 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: PUTCHAR() macro

On 12 sep 2013, at 03:31, Sayantan Datta <std2048@...il.com> wrote:
> On Thu, Sep 12, 2013 at 2:14 AM, magnum <john.magnum@...hmail.com> wrote:
>> This means we are using byte-addressed stores if allowed, instead of the bit flogging macro. This alone made for a 33% boost for RAR with recent AMD drivers and it benefits nvidia with any driver I know of. However, when testing it with your mask-mode raw-md5 it *halves* the speed for AMD (using 13.4 driver on Bull). It might be that the 32-bit macro is better for global memory or something like that. Good to know for future tweaking.
> 
> When byte-addressable store is used most likely some extra vgprs are required. For raw-md5-opencl vgpr count already hovers around 125 with catalyst 13.4. Since the speed gets halved I'm guessing you are exceeding 128 vgprs threshold which halves the number of in-flight wavefronts. It may not have anything to do with global memory access. 

I'd think that macro would need its share of registers too. What's the quickest/simplest way to see vgpr use and other things for AMD, without using any GUI? Is this on wiki somewhere?

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ