Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 16 Oct 2012 22:17:37 +0530
From: Sayantan Datta <>
Subject: Re: bitslice DES on GPU

On Sun, Oct 14, 2012 at 8:41 AM, Solar Designer <> wrote:

> On Sat, Oct 13, 2012 at 11:41:05PM +0530, Sayantan Datta wrote:
> > As a point of reference, what should be our targeted non-overhead speed?
> Something like 300M c/s at DES-based crypt(3) on HD 7970.  Maybe more
> than that if we hard-code E (generate or patch code on the fly).
> > For instance, Hashcat does 83.4M c/s in traditional-des.
> Actually, it does/did about 103M with older Catalyst versions, and will
> likely achieve that again with future ones.
> Alexander

Hi Alexander,

I was comparing the statistics of DES_bs_kernel vs the pbkdf2_kernel. The
prime reason for the bottleneck seems to be insufficient number of inflight
wavefronts causing poor ALU utilization. For comparison the ALU utilization
of pbkdf2 is 3 times that of des. Also there are some other factors such as
LDS bank conflicts etc.

Also is there any specific reason for doing 32 hashes per kernel? Can it be
lowered to something like 8 so that the size of the data block could be
reduced to 16 integers. If we could do so, would it reduce the size of K[]
? Also is it possible to reduce the number of regs used by the sboxes ?


Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.