Date: Thu, 18 Oct 2012 22:21:11 +0530 From: Sayantan Datta <std2048@...il.com> To: john-dev@...ts.openwall.com Subject: Re: bitslice DES on GPU On Thu, Oct 18, 2012 at 11:47 AM, Solar Designer <solar@...nwall.com> wrote: > It would imply something other than bitslicing for K as well. And the > S-boxes would be represented differently, too. But I think you should > stop thinking in this direction. I don't expect there's another useful > representation inbetween bitslicing and straightforward table lookups. > The primary reason for thinking in direction is to decrease the load on registers and local memory so that there are more number of inflight wavefront. Also decreasing the size of B to 16 ints would almost ensure that it really stays in private register space. Right now I doubt the B arrays are stored in register address space because each VGPR is only 256bit wide. However that depends entirely on how the VGPRs are used. For SIMD execution I think it is more logical to assume that each VGPR is loaded with data from 16 different kernels and not from only one kernel. Still if the array is not in register space we could be loosing lots of performance. Also it could be possible to use 4 array of 16 ints to represent the B array of 64 ints. But the indirect addersing using the 96 index array is causing problem. Since the 96 index array is almost constant the indexing could be done prior to execution. However it might require some radical approach like pre compiling the kernels manually. Regards, Sayantan Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.