john-dev - Re: PUTCHAR() macro

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <9ab750eae7809745007473bf2efaa941@smtp.hushmail.com>
Date: Thu, 12 Sep 2013 08:57:05 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: PUTCHAR() macro

On 12 sep 2013, at 03:31, Sayantan Datta <std2048@...il.com> wrote:
> On Thu, Sep 12, 2013 at 2:14 AM, magnum <john.magnum@...hmail.com> wrote:
>> This means we are using byte-addressed stores if allowed, instead of the bit flogging macro. This alone made for a 33% boost for RAR with recent AMD drivers and it benefits nvidia with any driver I know of. However, when testing it with your mask-mode raw-md5 it *halves* the speed for AMD (using 13.4 driver on Bull). It might be that the 32-bit macro is better for global memory or something like that. Good to know for future tweaking.
> 
> When byte-addressable store is used most likely some extra vgprs are required. For raw-md5-opencl vgpr count already hovers around 125 with catalyst 13.4. Since the speed gets halved I'm guessing you are exceeding 128 vgprs threshold which halves the number of in-flight wavefronts. It may not have anything to do with global memory access. 

I'd think that macro would need its share of registers too. What's the quickest/simplest way to see vgpr use and other things for AMD, without using any GUI? Is this on wiki somewhere?

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.