Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 23 Mar 2012 21:23:48 +0800
From: myrice <>
Subject: Possible improvement of cryptsha256-cuda


Lukas, I am reading your cryptsha256-cuda code. The cuda output buffer is
not coalesce accessed. That is(in file cuda/
284#pragma unroll 8

285    for (i = 0; i < 8; i++)

286        tresult[hash_addr(i, idx)] = alt_result[i];
The hash_addr is:
#define hash_addr(j,idx) (((j)*(KEYS_PER_CRYPT))+(idx))

However, the access pattern is not regular. That means we will access 0
2000 4000. And each access need a large memory cycles. In cuda
4.1 profiler. It also says that the global memory store is very
inefficiency. I think we could change it to idx*8+i. And make an address
translate in cpu side. I am doing this!

Will let you know the result.

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.