|   | 
| 
 | 
Message-ID: <20121208233148.GB32738@openwall.com> Date: Sun, 9 Dec 2012 03:31:48 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: bitslice DES on GPU On Sun, Dec 09, 2012 at 12:04:55AM +0100, magnum wrote: > On 7 Dec, 2012, at 9:42 , Sayantan Datta <std2048@...il.com> wrote: > > On Fri, Dec 7, 2012 at 11:39 AM, Milen Rangelov <gat3way@...il.com> wrote: > > Why would you want to do that via patching (given that they are compile-time constants)? > > > > The so called constants will change with every new salt which is why I need to patch them at runtime. > > What amount of data are we talking about? Have you tried fitting them in a __constant buffer? Or would that likely be significantly slower? What I think happens at ISA level when we hard-code (or runtime patch) the offsets is something like (in pseudo-assembly): LD reg1,[reg2+immediate_offset] or in case some other base offset was involved, it could be: LD reg1,[reg5+reg2+immediate_offset] What happens when we have the actual offsets in some memory area instead: LD reg3,[reg4+immediate_offset] LD reg1,[reg2+reg3] or in case some other base offset was involved, it could be even: LD reg3,[reg4+immediate_offset] ADD reg3,reg2,reg3 LD reg1,[reg5+reg3] Even if the first LD is such that it loads from some different address space (memory available anyway, possibly lower latency), it's still significant extra work - and we use an extra register here. To hide the latency of the first LD, we'd use many extra registers (many per work-item, times many work-items). So I think that runtime patching at native ISA level is the way to go. Re-transferring the code to the GPU is OK. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.