Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 23 Mar 2012 21:23:48 +0800
From: myrice <qqlddg@...il.com>
To: john-dev@...ts.openwall.com
Subject: Possible improvement of cryptsha256-cuda

Hi,

Lukas, I am reading your cryptsha256-cuda code. The cuda output buffer is
not coalesce accessed. That is(in file cuda/cryptsha256.cu):
284#pragma unroll 8

285    for (i = 0; i < 8; i++)

286        tresult[hash_addr(i, idx)] = alt_result[i];
The hash_addr is:
#define hash_addr(j,idx) (((j)*(KEYS_PER_CRYPT))+(idx))

However, the access pattern is not regular. That means we will access 0
2000 4000. And each access need a large memory cycles. In cuda
4.1 profiler. It also says that the global memory store is very
inefficiency. I think we could change it to idx*8+i. And make an address
translate in cpu side. I am doing this!

Will let you know the result.

[ CONTENT OF TYPE text/html SKIPPED ]

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ