john-dev - Re: descrypt speed

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+E3k93=eCaFt1XB7K7ZVABcsWjKLEicSMjhm9hiGvEwC+kqJA@mail.gmail.com>
Date: Wed, 18 Feb 2015 22:30:50 -0900
From: Royce Williams <royce@...ho.org>
To: john-dev <john-dev@...ts.openwall.com>
Subject: Re: descrypt speed

On Wed, Feb 18, 2015 at 10:08 PM, Sayantan Datta <std2048@...il.com> wrote:
>
>
> On Thu, Feb 19, 2015 at 11:59 AM, Sayantan Datta <std2048@...il.com> wrote:
>>
>>
>>
>> On Mon, Nov 3, 2014 at 3:32 AM, Royce Williams <royce@...ho.org> wrote:
>>>
>>> On Sun, Nov 2, 2014 at 12:19 PM, magnum <john.magnum@...hmail.com> wrote:
>>>>
>>>> On 2014-11-02 18:59, Royce Williams wrote:
>>>>>
>>>>> On Thu, Oct 30, 2014 at 9:33 PM, magnum <john.magnum@...hmail.com>
>>>>> wrote:
>>>>>>
>>>>>> On 2014-10-31 06:02, Royce Williams wrote:
>>>>>>>
>>>>>>> On a GTX970, shouldn't this be sm_52?
>>>>>>
>>>>>>
>>>>>> You can force this by editing NVCC_FLAGS in Makefile. Add something
>>>>>> like
>>>>>> "-arch sm_50" (or 52). But I doubt it will make much difference and it
>>>>>> will
>>>>>> only affect CUDA formats.
>>>>>
>>>>>
>>>>> In my system with both an sm_20 and an sm_50 card, when running solely
>>>>> descrypt-opencl (not CUDA), the ptxas info shows that sm_50 is involved
>>>>> in
>>>>> some way.  Is this cosmetic?
>>>>
>>>>
>>>> OpenCL compiles a suitable (different) kernel for each and you do not
>>>> have to configure anything.
>>>
>>>
>>> What's giving me pause is that without changing anything on either
>>> system, descrypt-opencl is appropriately using sm_20 and sm_50 on my
>>> heterogeneous system, but is only using sm_20 on my GTX750 system.
>>> Previously, the latter system was happily using sm_52.  I am not sure what
>>> changed.
>>>
>>>>
>>>> You can configure CUDA for compiling several archs at once, see "nvcc
>>>> --help". It something like "-gencode arch=compute_20,code=sm_20 -gencode
>>>> arch=compute_50,code=sm_50" (added to NVCC_FLAGS instead of just -arch
>>>> sm_xx). The one most suitable of them will be picked at runtime.
>>>
>>>
>>> Interesting -- I'll try that.
>>>
>>> Royce
>>
>>
>> Hi Royce, magnum,
>>
>> If you are interested, you can test the new revision of descrypt-opencl on
>> 970, 980 and 290X. There are three kernels and you can select them by
>> changing the parameters HARDCODE_SALT and FULL_UNROLL in
>> opencl_DES_hst_dev_shared.h. Setting (1,1) gives you the fastest kernel but
>> takes very long to compile, however subsequent runs should compile much
>> quicker as pre-compiled kernels(saved to the disk from the prior runs) are
>> used. Setting (1,0) gives slower speed but faster compilation time. Setting
>> (0,0) is the slowest but compilation is quickest. Also do not fork on same
>> system when HARDCODE_SALT is 1.
>>
>> Regards,
>> Sayantan
>
>
> Actually, fork may be used with HARDCODE_SALT =1 but at most 2 threads,
> anything more than that is wasteful and you may need ton of RAM. Even with
> --fork == 2, I think you should have at least 8GB RAM. Another problem we
> currently have when using fork is that kernels are compiled n times for n
> threads which is unnecessary. However we can trick that by using --fork=1 to
> compile all kernels and then restart using --fork=2.
>
> Some performance Numbers using --fork = 2, HARCODE_SALT=1, FULL_UNROLL=1,
> 124 passwords and 122 salts, GPU: 7970(925Mhz core, 1375Mhz memory)
>
> 2 0g 0:00:05:07  3/3 0g/s 749774p/s 91400Kc/s 92900KC/s GPU:61°C util:97%
> fan:27% scprugas..myremy26
> 1 0g 0:00:05:07  3/3 0g/s 749756p/s 91398Kc/s 92898KC/s GPU:61°C util:97%
> fan:27% 339gmh..8jfu44
>
> Performance with --fork=1
> 0g 0:00:04:25  3/3 0g/s 1324Kp/s 161247Kc/s 163891KC/s GPU:60°C util:87%
> fan:27% srusuu..07pvjy

Thanks for the opportunity to test!

Here are my results of "--test --format=descrypt-opencl" for a GTX 970
SC (factory overclocked to 1316 MHz):

First, a baseline - performance using magnumripper from a couple of months ago:

Many salts:     46137K c/s real, 45680K c/s virtual
Only one salt:  25700K c/s real, 25700K c/s virtual


Using fb0b9383d6 magnumripper from today, for
(HARDCODE_SALT,FULL_UNROLL) values:

(0,0)

Many salts:     77345K c/s real, 77345K c/s virtual
Only one salt:  35298K c/s real, 35298K c/s virtual

(1,0)

Many salts:     77864K c/s real, 78643K c/s virtual
Only one salt:  34952K c/s real, 34952K c/s virtual

(1,1)

Many salts:     169869K c/s real, 169869K c/s virtual
Only one salt:  47710K c/s real, 48192K c/s virtual

(That's quite a jump. Not knowing any better, is the many-salts value
really supposed to be that high?)


Here is real-world performance on a single card against a single hash,
no fork, after ~10 minutes:

0g 0:00:10:44 0.00% 3/3 (ETA: 2020-09-07 01:11) 0g/s 38282Kp/s
38282Kc/s 38282KC/s GPU:41°C fan:45% etyc45x..euamdhj

... and fork=6 (one core per identical GPU), for what it's worth (this
seemed to work fine on my 16GB system):

4 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-04-19 02:58) 0g/s 30421Kp/s
30421Kc/s 30421KC/s GPU:34°C fan:45% mmjhj31j..mmrrpdly
2 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-04-06 22:04) 0g/s 31320Kp/s
31320Kc/s 31320KC/s GPU:40°C fan:45% hlc8466*..hllhikko
5 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-11-25 12:35) 0g/s 20033Kp/s
20033Kc/s 20033KC/s GPU:39°C fan:45% 9dzjt0e..9/1bb9m
3 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-04-01 17:17) 0g/s 31719Kp/s
31719Kc/s 31719KC/s GPU:33°C fan:45% nrnUQp..n2j=h!
1 0g 0:00:04:16 0.00% 3/3 (ETA: 2016-03-26 11:21) 0g/s 32213Kp/s
32213Kc/s 32213KC/s GPU:41°C fan:45% bs9ntql..byisi7a
6 0g 0:00:04:16 0.00% 3/3 (ETA: 2017-01-02 14:46) 0g/s 18917Kp/s
18917Kc/s 18917KC/s GPU:40°C fan:45% agb_co6..azo52r2

(Aggregate: 164413Kp/s)

... and fork=8 (more processes starved for CPU, but more aggregate throughput):

5 0g 0:00:02:20 0.00% 3/3 (ETA: 2016-08-08 08:33) 0g/s 18030Kp/s
18030Kc/s 18030KC/s GPU:39°C fan:45% 2d2inl1n..2d2ottrd
1 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-29 00:52) 0g/s 28015Kp/s
28015Kc/s 28015KC/s GPU:46°C fan:45% 03-9be32..03alus42
4 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-03-02 19:39) 0g/s 25572Kp/s
25572Kc/s 25572KC/s GPU:32°C fan:45% plzzgm1...plp2b3sk
3 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-23 10:53) 0g/s 28654Kp/s
28654Kc/s 28654KC/s GPU:33°C fan:45% 8c9gt7i..8cci13k
6 0g 0:00:02:20 0.00% 3/3 (ETA: 2016-09-10 02:55) 0g/s 16992Kp/s
16992Kc/s 16992KC/s GPU:39°C fan:45% kmk14en8..kmher2a3
7 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-26 16:56) 0g/s 28266Kp/s
28266Kc/s 28266KC/s GPU:46°C fan:45% lhgeh730..l0nn0wow
8 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-02-13 21:08) 0g/s 26841Kp/s
26841Kc/s 26841KC/s GPU:41°C fan:45% cl1kiylu..clrh2bl1
2 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-03-02 11:01) 0g/s 25565Kp/s
25565Kc/s 25565KC/s GPU:41°C fan:45% do_7af3..di7z7h8

(Aggregate: 197935Kp/s)

And ignore the identical fan speeds; I have them all locked at 45% right now.

Royce
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.