john-dev - Re: Possible improvement of cryptsha256-cuda

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20120324014011.GB5676@openwall.com>
Date: Sat, 24 Mar 2012 05:40:11 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: Possible improvement of cryptsha256-cuda

Hi myrice,

On Fri, Mar 23, 2012 at 09:23:48PM +0800, myrice wrote:
> Lukas, I am reading your cryptsha256-cuda code. The cuda output buffer is
> not coalesce accessed. That is(in file cuda/cryptsha256.cu):
> 284#pragma unroll 8
> 
> 285    for (i = 0; i < 8; i++)
> 
> 286        tresult[hash_addr(i, idx)] = alt_result[i];
> The hash_addr is:
> #define hash_addr(j,idx) (((j)*(KEYS_PER_CRYPT))+(idx))
> 
> However, the access pattern is not regular. That means we will access 0
> 2000 4000. And each access need a large memory cycles. In cuda
> 4.1 profiler. It also says that the global memory store is very
> inefficiency. I think we could change it to idx*8+i. And make an address
> translate in cpu side. I am doing this!

Thank you for posting and for working on this!  What you're doing is in
line with my suggestions to Sayantan, which I've just posted. :-)

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.