Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 30 Oct 2014 21:02:55 -0800
From: Royce Williams <>
To: john-dev <>
Subject: Re: descrypt speed (was: "Failed copy data to gpu" when
 using fork with descrypt-opencl)

On Thu, Oct 30, 2014 at 6:31 PM, magnum <> wrote:
> On 2014-10-30 16:49, Royce Williams wrote:
>>> Using -fork=4 on a quadcore+HT and GTX980 I got over 82 Mc/s.
>> On my 8-core AMD and GTX970, using fork=2 gets me 52 Mc/s, which is
>> much better than no fork (~35 Mc/s).  fork=3 settles in around 54
>> Mc/s.  Forking more than 3 doesn't materially increase the c/s rate.
> Solar, Sayantan, all,
> Why is this? This is bordering candidate generation bottleneck but that's
> not quite the problem, is it? So what is the bottleneck? Could we do
> something to make it faster without forking or *is* it just candidate
> generation?
> Also, as far as I understand just from googling, Atom has yet to implement
> bitslicing. Yet his descrypt exceeds 100M c/s on a single Tahiti (according
> to How is that
> possible? Should we not beat him silly with our bitslicing version?

We may need to determine if it's happening to others as well.
Something odd is happening that may be on my side.

Going back through my config/make cycle, I didn't notice this at first:

ptxas info    : Compiling entry function
'_Z13kernel_phpassPhP12phpass_crack' for 'sm_20'

In fact, all of the appearances of sm_[0-9+] in my ./configure and
make results appear to be using sm_20.  Strings on the john binary
only shows sm_20 in use.

On a GTX970, shouldn't this be sm_52?

Is there anything in my cuda setup that might cause this, external to john?


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ