[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 13 Aug 2012 09:40:11 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice DES on GPU
Sayantan -
On Mon, Aug 13, 2012 at 09:19:35AM +0400, Solar Designer wrote:
> On Mon, Aug 13, 2012 at 08:36:46AM +0530, Sayantan Datta wrote:
> > What should be the value of DES_BS_EXPAND for GPU implentation ?
[...]
> To summarize, you'd need to try three approaches (and their variations):
>
> 1. Keys in global memory, expanded.
>
> 2. Keys in global memory, not expanded.
>
> 3. Keys in local memory, not expanded.
>
> For LM hashes, it's just #2 or #3 above. Maybe start with #3?
There's yet another option:
4. Unroll the entire 16-round DES loop. Then you'll have the right 768
indices (with repeats) right in the code. If the code fits in the same
cache level that it would with a mere 2-round unroll, then you will
achieve a better speed in this way than you would with the approaches
discussed above.
In the current CPU code, I use a 2x unroll in this loop - that is, I
have 8 iterations with a 2 DES round loop body. There's not enough L1
instruction cache on a typical CPU for a full 16-round unroll. Maybe
on some GPUs this is different.
Alexander
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux -
Powered by OpenVZ