Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 19 Feb 2015 12:38:20 +0530
From: Sayantan Datta <>
To: john-dev <>
Subject: Re: descrypt speed

On Thu, Feb 19, 2015 at 11:59 AM, Sayantan Datta <> wrote:

> On Mon, Nov 3, 2014 at 3:32 AM, Royce Williams <> wrote:
>> On Sun, Nov 2, 2014 at 12:19 PM, magnum <> wrote:
>>> On 2014-11-02 18:59, Royce Williams wrote:
>>>> On Thu, Oct 30, 2014 at 9:33 PM, magnum <>
>>>> wrote:
>>>>> On 2014-10-31 06:02, Royce Williams wrote:
>>>>>> On a GTX970, shouldn't this be sm_52?
>>>>> You can force this by editing NVCC_FLAGS in Makefile. Add something
>>>>> like
>>>>> "-arch sm_50" (or 52). But I doubt it will make much difference and it
>>>>> will
>>>>> only affect CUDA formats.
>>>> In my system with both an sm_20 and an sm_50 card, when running solely
>>>> descrypt-opencl (not CUDA), the ptxas info shows that sm_50 is involved
>>>> in
>>>> some way.  Is this cosmetic?
>>> OpenCL compiles a suitable (different) kernel for each and you do not
>>> have to configure anything.
>> What's giving me pause is that without changing anything on either
>> system, descrypt-opencl is appropriately using sm_20 and sm_50 on my
>> heterogeneous system, but is only using sm_20 on my GTX750 system.
>> Previously, the latter system was happily using sm_52.  I am not sure what
>> changed.
>>> You can configure CUDA for compiling several archs at once, see "nvcc
>>> --help". It something like "-gencode arch=compute_20,code=sm_20 -gencode
>>> arch=compute_50,code=sm_50" (added to NVCC_FLAGS instead of just -arch
>>> sm_xx). The one most suitable of them will be picked at runtime.
>> Interesting -- I'll try that.
>> Royce
> Hi Royce, magnum,
> If you are interested, you can test the new revision of descrypt-opencl on
> 970, 980 and 290X. There are three kernels and you can select them by
> changing the parameters HARDCODE_SALT and FULL_UNROLL in
> opencl_DES_hst_dev_shared.h. Setting (1,1) gives you the fastest kernel but
> takes very long to compile, however subsequent runs should compile much
> quicker as pre-compiled kernels(saved to the disk from the prior runs) are
> used. Setting (1,0) gives slower speed but faster compilation time. Setting
> (0,0) is the slowest but compilation is quickest. Also do not fork on same
> system when HARDCODE_SALT is 1.
> Regards,
> Sayantan

Actually, fork may be used with HARDCODE_SALT =1 but at most 2 threads,
anything more than that is wasteful and you may need ton of RAM. Even with
--fork == 2, I think you should have at least 8GB RAM. Another problem we
currently have when using fork is that kernels are compiled n times for n
threads which is unnecessary. However we can trick that by using --fork=1
to compile all kernels and then restart using --fork=2.

Some performance Numbers using --fork = 2, HARCODE_SALT=1, FULL_UNROLL=1,
124 passwords and 122 salts, GPU: 7970(925Mhz core, 1375Mhz memory)

2 0g 0:00:05:07  3/3 0g/s 749774p/s 91400Kc/s 92900KC/s GPU:61°C util:97%
fan:27% scprugas..myremy26
1 0g 0:00:05:07  3/3 0g/s 749756p/s 91398Kc/s 92898KC/s GPU:61°C util:97%
fan:27% 339gmh..8jfu44

Performance with --fork=1
0g 0:00:04:25  3/3 0g/s 1324Kp/s 161247Kc/s 163891KC/s GPU:60°C util:87%
fan:27% srusuu..07pvjy


Content of type "text/html" skipped

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ