Date: Sat, 24 Mar 2012 05:40:11 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Possible improvement of cryptsha256-cuda Hi myrice, On Fri, Mar 23, 2012 at 09:23:48PM +0800, myrice wrote: > Lukas, I am reading your cryptsha256-cuda code. The cuda output buffer is > not coalesce accessed. That is(in file cuda/cryptsha256.cu): > 284#pragma unroll 8 > > 285 for (i = 0; i < 8; i++) > > 286 tresult[hash_addr(i, idx)] = alt_result[i]; > The hash_addr is: > #define hash_addr(j,idx) (((j)*(KEYS_PER_CRYPT))+(idx)) > > However, the access pattern is not regular. That means we will access 0 > 2000 4000. And each access need a large memory cycles. In cuda > 4.1 profiler. It also says that the global memory store is very > inefficiency. I think we could change it to idx*8+i. And make an address > translate in cpu side. I am doing this! Thank you for posting and for working on this! What you're doing is in line with my suggestions to Sayantan, which I've just posted. :-) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.