Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 9 Dec 2012 03:31:48 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice DES on GPU

On Sun, Dec 09, 2012 at 12:04:55AM +0100, magnum wrote:
> On 7 Dec, 2012, at 9:42 , Sayantan Datta <std2048@...il.com> wrote:
> > On Fri, Dec 7, 2012 at 11:39 AM, Milen Rangelov <gat3way@...il.com> wrote:
> > Why would you want to do that via patching (given that they are compile-time constants)?
> > 
> > The so called constants will change with every new salt which is why I need to patch them at runtime.
> 
> What amount of data are we talking about? Have you tried fitting them in a __constant buffer? Or would that likely be significantly slower?

What I think happens at ISA level when we hard-code (or runtime patch)
the offsets is something like (in pseudo-assembly):

	LD reg1,[reg2+immediate_offset]

or in case some other base offset was involved, it could be:

	LD reg1,[reg5+reg2+immediate_offset]

What happens when we have the actual offsets in some memory area instead:

	LD reg3,[reg4+immediate_offset]
	LD reg1,[reg2+reg3]

or in case some other base offset was involved, it could be even:

	LD reg3,[reg4+immediate_offset]
	ADD reg3,reg2,reg3
	LD reg1,[reg5+reg3]

Even if the first LD is such that it loads from some different address
space (memory available anyway, possibly lower latency), it's still
significant extra work - and we use an extra register here.  To hide the
latency of the first LD, we'd use many extra registers (many per
work-item, times many work-items).

So I think that runtime patching at native ISA level is the way to go.
Re-transferring the code to the GPU is OK.

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ