Date: Mon, 13 Aug 2012 09:40:11 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bitslice DES on GPU Sayantan - On Mon, Aug 13, 2012 at 09:19:35AM +0400, Solar Designer wrote: > On Mon, Aug 13, 2012 at 08:36:46AM +0530, Sayantan Datta wrote: > > What should be the value of DES_BS_EXPAND for GPU implentation ? [...] > To summarize, you'd need to try three approaches (and their variations): > > 1. Keys in global memory, expanded. > > 2. Keys in global memory, not expanded. > > 3. Keys in local memory, not expanded. > > For LM hashes, it's just #2 or #3 above. Maybe start with #3? There's yet another option: 4. Unroll the entire 16-round DES loop. Then you'll have the right 768 indices (with repeats) right in the code. If the code fits in the same cache level that it would with a mere 2-round unroll, then you will achieve a better speed in this way than you would with the approaches discussed above. In the current CPU code, I use a 2x unroll in this loop - that is, I have 8 iterations with a 2 DES round loop body. There's not enough L1 instruction cache on a typical CPU for a full 16-round unroll. Maybe on some GPUs this is different. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.