Date: Fri, 23 Mar 2012 21:23:48 +0800 From: myrice <qqlddg@...il.com> To: john-dev@...ts.openwall.com Subject: Possible improvement of cryptsha256-cuda Hi, Lukas, I am reading your cryptsha256-cuda code. The cuda output buffer is not coalesce accessed. That is(in file cuda/cryptsha256.cu): 284#pragma unroll 8 285 for (i = 0; i < 8; i++) 286 tresult[hash_addr(i, idx)] = alt_result[i]; The hash_addr is: #define hash_addr(j,idx) (((j)*(KEYS_PER_CRYPT))+(idx)) However, the access pattern is not regular. That means we will access 0 2000 4000. And each access need a large memory cycles. In cuda 4.1 profiler. It also says that the global memory store is very inefficiency. I think we could change it to idx*8+i. And make an address translate in cpu side. I am doing this! Will let you know the result. Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.