Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 31 Oct 2014 06:33:00 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: descrypt speed

On 2014-10-31 06:02, Royce Williams wrote:
> On Thu, Oct 30, 2014 at 6:31 PM, magnum <john.magnum@...hmail.com> wrote:
>> On 2014-10-30 16:49, Royce Williams wrote:
>>>>
>>>> Using -fork=4 on a quadcore+HT and GTX980 I got over 82 Mc/s.
>>>
>>>
>>> On my 8-core AMD and GTX970, using fork=2 gets me 52 Mc/s, which is
>>> much better than no fork (~35 Mc/s).  fork=3 settles in around 54
>>> Mc/s.  Forking more than 3 doesn't materially increase the c/s rate.
>>
>> Solar, Sayantan, all,
>>
>> Why is this? This is bordering candidate generation bottleneck but that's
>> not quite the problem, is it? So what is the bottleneck? Could we do
>> something to make it faster without forking or *is* it just candidate
>> generation?
>
> We may need to determine if it's happening to others as well.
> Something odd is happening that may be on my side.

Your numbers are not particularly bad afaik. I'm concerned about "fork" 
being beneficial at all, for anyone. For a semi-slow format like this I 
think a 50% speedup from using fork means we should be able to improve 
something.

> Going back through my config/make cycle, I didn't notice this at first:
>
> ptxas info    : Compiling entry function
> '_Z13kernel_phpassPhP12phpass_crack' for 'sm_20'
>
> In fact, all of the appearances of sm_[0-9+] in my ./configure and
> make results appear to be using sm_20.  Strings on the john binary
> only shows sm_20 in use.
>
> On a GTX970, shouldn't this be sm_52?

You can force this by editing NVCC_FLAGS in Makefile. Add something like 
"-arch sm_50" (or 52). But I doubt it will make much difference and it 
will only affect CUDA formats.

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ