john-dev - Re: descrypt speed

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+TsHUCOtmb8NK9NbTyXShS4PsdybEa2Gmw3zubyEZiHQZCgpA@mail.gmail.com>
Date: Thu, 19 Feb 2015 12:38:20 +0530
From: Sayantan Datta <std2048@...il.com>
To: john-dev <john-dev@...ts.openwall.com>
Subject: Re: descrypt speed

On Thu, Feb 19, 2015 at 11:59 AM, Sayantan Datta <std2048@...il.com> wrote:

>
>
> On Mon, Nov 3, 2014 at 3:32 AM, Royce Williams <royce@...ho.org> wrote:
>
>> On Sun, Nov 2, 2014 at 12:19 PM, magnum <john.magnum@...hmail.com> wrote:
>>
>>> On 2014-11-02 18:59, Royce Williams wrote:
>>>
>>>> On Thu, Oct 30, 2014 at 9:33 PM, magnum <john.magnum@...hmail.com>
>>>> wrote:
>>>>
>>>>> On 2014-10-31 06:02, Royce Williams wrote:
>>>>>
>>>>>> On a GTX970, shouldn't this be sm_52?
>>>>>>
>>>>>
>>>>> You can force this by editing NVCC_FLAGS in Makefile. Add something
>>>>> like
>>>>> "-arch sm_50" (or 52). But I doubt it will make much difference and it
>>>>> will
>>>>> only affect CUDA formats.
>>>>>
>>>>
>>>> In my system with both an sm_20 and an sm_50 card, when running solely
>>>> descrypt-opencl (not CUDA), the ptxas info shows that sm_50 is involved
>>>> in
>>>> some way.  Is this cosmetic?
>>>>
>>>
>>> OpenCL compiles a suitable (different) kernel for each and you do not
>>> have to configure anything.
>>>
>>
>> What's giving me pause is that without changing anything on either
>> system, descrypt-opencl is appropriately using sm_20 and sm_50 on my
>> heterogeneous system, but is only using sm_20 on my GTX750 system.
>> Previously, the latter system was happily using sm_52.  I am not sure what
>> changed.
>>
>>
>>> You can configure CUDA for compiling several archs at once, see "nvcc
>>> --help". It something like "-gencode arch=compute_20,code=sm_20 -gencode
>>> arch=compute_50,code=sm_50" (added to NVCC_FLAGS instead of just -arch
>>> sm_xx). The one most suitable of them will be picked at runtime.
>>
>>
>> Interesting -- I'll try that.
>>
>> Royce
>>
>
> Hi Royce, magnum,
>
> If you are interested, you can test the new revision of descrypt-opencl on
> 970, 980 and 290X. There are three kernels and you can select them by
> changing the parameters HARDCODE_SALT and FULL_UNROLL in
> opencl_DES_hst_dev_shared.h. Setting (1,1) gives you the fastest kernel but
> takes very long to compile, however subsequent runs should compile much
> quicker as pre-compiled kernels(saved to the disk from the prior runs) are
> used. Setting (1,0) gives slower speed but faster compilation time. Setting
> (0,0) is the slowest but compilation is quickest. Also do not fork on same
> system when HARDCODE_SALT is 1.
>
> Regards,
> Sayantan
>

Actually, fork may be used with HARDCODE_SALT =1 but at most 2 threads,
anything more than that is wasteful and you may need ton of RAM. Even with
--fork == 2, I think you should have at least 8GB RAM. Another problem we
currently have when using fork is that kernels are compiled n times for n
threads which is unnecessary. However we can trick that by using --fork=1
to compile all kernels and then restart using --fork=2.

Some performance Numbers using --fork = 2, HARCODE_SALT=1, FULL_UNROLL=1,
124 passwords and 122 salts, GPU: 7970(925Mhz core, 1375Mhz memory)

2 0g 0:00:05:07  3/3 0g/s 749774p/s 91400Kc/s 92900KC/s GPU:61°C util:97%
fan:27% scprugas..myremy26
1 0g 0:00:05:07  3/3 0g/s 749756p/s 91398Kc/s 92898KC/s GPU:61°C util:97%
fan:27% 339gmh..8jfu44

Performance with --fork=1
0g 0:00:04:25  3/3 0g/s 1324Kp/s 161247Kc/s 163891KC/s GPU:60°C util:87%
fan:27% srusuu..07pvjy

Regards,
Sayantan

Content of type "text/html" skipped
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.