john-dev - Possible improvement of cryptsha256-cuda

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [thread-next>] [day] [month] [year] [list]

Message-ID: <CANJ2NMMWaRzVQzTA_EQ-DudK5kjBqE9nTivpA7Hbi9JVSmtmPg@mail.gmail.com>
Date: Fri, 23 Mar 2012 21:23:48 +0800
From: myrice <qqlddg@...il.com>
To: john-dev@...ts.openwall.com
Subject: Possible improvement of cryptsha256-cuda

Hi,

Lukas, I am reading your cryptsha256-cuda code. The cuda output buffer is
not coalesce accessed. That is(in file cuda/cryptsha256.cu):
284#pragma unroll 8

285    for (i = 0; i < 8; i++)

286        tresult[hash_addr(i, idx)] = alt_result[i];
The hash_addr is:
#define hash_addr(j,idx) (((j)*(KEYS_PER_CRYPT))+(idx))

However, the access pattern is not regular. That means we will access 0
2000 4000. And each access need a large memory cycles. In cuda
4.1 profiler. It also says that the global memory store is very
inefficiency. I think we could change it to idx*8+i. And make an address
translate in cpu side. I am doing this!

Will let you know the result.

Content of type "text/html" skipped

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.