Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 13 Aug 2012 09:40:11 +0400
From: Solar Designer <>
Subject: Re: bitslice DES on GPU

Sayantan -

On Mon, Aug 13, 2012 at 09:19:35AM +0400, Solar Designer wrote:
> On Mon, Aug 13, 2012 at 08:36:46AM +0530, Sayantan Datta wrote:
> > What should be the value of DES_BS_EXPAND for GPU implentation ?
> To summarize, you'd need to try three approaches (and their variations):
> 1. Keys in global memory, expanded.
> 2. Keys in global memory, not expanded.
> 3. Keys in local memory, not expanded.
> For LM hashes, it's just #2 or #3 above.  Maybe start with #3?

There's yet another option:

4. Unroll the entire 16-round DES loop.  Then you'll have the right 768
indices (with repeats) right in the code.  If the code fits in the same
cache level that it would with a mere 2-round unroll, then you will
achieve a better speed in this way than you would with the approaches
discussed above.

In the current CPU code, I use a 2x unroll in this loop - that is, I
have 8 iterations with a 2 DES round loop body.  There's not enough L1
instruction cache on a typical CPU for a full 16-round unroll.  Maybe
on some GPUs this is different.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.