Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 22 Sep 2012 08:58:39 +0400
From: Solar Designer <>
Subject: Re: bitslice DES on GPU


On Sat, Sep 22, 2012 at 01:49:48AM +0530, Sayantan Datta wrote:
> > On Sat, Sep 22, 2012 at 1:41 AM, Sayantan Datta <> wrote:
> >> In the previous test Global no. of work items were half of this time. So
> >> the overhead is double in this test than the last one.

Doesn't this also double the amount of useful work being done?  If so,
the relative overhead has actually stayed the same.

> Here's the correct one:
> 1/(1/78 + 1/35) = 24

Is the 78 obtained as 2*39, where the 39 is my 39M figure for "overhead
speed" of your previous code revision?  If so, if the overhead actually
doubled, you'd need to halve its "speed" figure.  So you'd use 19.5 in
this equation, not 78.  But your assumption that the overhead has
doubled is probably wrong anyway, as I tried to explain above.  If the
overhead did in fact double, you'd obtain total speed no better than 19.5.

Anyhow, is your new code revision available anywhere?

Meanwhile, I had a nice conversation with atom on #openwall.  He had
found this paper, which he wanted to share with us:

This is diploma thesis of Marc Schober.  On page 56 (page 64 per the PDF
file's numbering), Marc talks about bitslice DES on GPU.  This continues
on page 111 (119 per PDF), including a funny quote from me dating back
to 2006 (yes, I really did not look into GPUs at the time, and they were
not used for password cracking until 2007).  Marc's work apparently
occurred some time between 2008 and 2010 (the PDF was generated on
2010-09-13).  This also explains comparison against JtR's DES running on
one CPU core only.  (I only implemented OpenMP for DES in May of 2010 in
form of a separate patch, which Marc might not have been aware of, or
his experiments might have occurred earlier than May 2010.)

Anyway, Marc achieved 12.9M c/s with bitslice DES on GTX 260, and he
estimated a theoretical peak performance of 27M to 31M on that GPU.
This is with Matthew Kwan's S-boxes (which we've since replaced with
Roman's smaller ones).

As is common with academic papers, no code is provided.


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ