Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 18 Oct 2012 10:17:05 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: bitslice DES on GPU

On Tue, Oct 16, 2012 at 10:17:37PM +0530, Sayantan Datta wrote:
> I was comparing the statistics of DES_bs_kernel vs the pbkdf2_kernel. The
> prime reason for the bottleneck seems to be insufficient number of inflight
> wavefronts causing poor ALU utilization. For comparison the ALU utilization
> of pbkdf2 is 3 times that of des. Also there are some other factors such as
> LDS bank conflicts etc.

We're looking for the cause of a much more than a factor of 3
performance hit.  Besides, if you reduce pbkdf2_kernel's number of
in-flight wavefronts by a factor of 3, the slowdown will probably be
less than a factor of 3 (maybe a lot less).  So I think there's
something else as well, maybe something more important.

> Also is there any specific reason for doing 32 hashes per kernel?

You mean per work-item?  Yes: that's the bit width of SIMD vector
elements in GPU hardware.  With bitslicing, we have to match this width,
or we'd be wasting it.

> Can it be
> lowered to something like 8 so that the size of the data block could be
> reduced to 16 integers.

With bitslicing, our 64 data bits - for one hash - are spread across 64
different array elements (but are in the same bit layer).  A more
compact representation would mean that we implement something other than
bitslicing.

> If we could do so, would it reduce the size of K[] ?

It would imply something other than bitslicing for K as well.  And the
S-boxes would be represented differently, too.  But I think you should
stop thinking in this direction.  I don't expect there's another useful
representation inbetween bitslicing and straightforward table lookups.

> Also is it possible to reduce the number of regs used by the sboxes ?

Only a little bit - by tuning those #define's.  Reducing the number of
regs needed for temporaries was definitely among the selection criteria
for the S-box expressions when Roman and I worked on this last year.
(There were thousands of other versions that would require more regs.)

Alexander

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ