Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 15 Dec 2009 17:08:10 +0300
From: Solar Designer <>
Subject: Re: Bit slice DES for CUDA

On Thu, Dec 10, 2009 at 10:40:45PM +0200, Dennis Yurichev wrote:
> Does anybody had attempt to port bit sliced DES routines to CUDA?
> I tried (got deseval.c from )
> But it work very slow, because compiled routine consumes a lot of
> registers and they shifted to local memory, which is very slow.

IIRC, deseval.c uses Matthew's slightly older S-box expressions than the
final ones he released as sboxes.c and nonstd.c.  If you want to play
with his C code, I suggest that you pick nonstd.c.  Also, you shouldn't
be too concerned about the deseval() function failing to fit everything
into registers.  Most of the processing time should be in individual
S-boxes, not in this "wrapper" function.  Thus, you should focus on
optimal register usage within the S-boxes.  In practice, 16 registers
should be enough to implement Matthew's S-box expressions, whereas 8
registers is not enough (a few intermediary values have to be stored in
"memory").  This is clearly seen in JtR's x86-64.S vs. x86-sse.S.

Just my $0.02.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.