Date: Thu, 19 Feb 2015 11:59:19 +0530 From: Sayantan Datta <std2048@...il.com> To: john-dev <john-dev@...ts.openwall.com> Subject: Re: descrypt speed On Mon, Nov 3, 2014 at 3:32 AM, Royce Williams <royce@...ho.org> wrote: > On Sun, Nov 2, 2014 at 12:19 PM, magnum <john.magnum@...hmail.com> wrote: > >> On 2014-11-02 18:59, Royce Williams wrote: >> >>> On Thu, Oct 30, 2014 at 9:33 PM, magnum <john.magnum@...hmail.com> >>> wrote: >>> >>>> On 2014-10-31 06:02, Royce Williams wrote: >>>> >>>>> On a GTX970, shouldn't this be sm_52? >>>>> >>>> >>>> You can force this by editing NVCC_FLAGS in Makefile. Add something like >>>> "-arch sm_50" (or 52). But I doubt it will make much difference and it >>>> will >>>> only affect CUDA formats. >>>> >>> >>> In my system with both an sm_20 and an sm_50 card, when running solely >>> descrypt-opencl (not CUDA), the ptxas info shows that sm_50 is involved >>> in >>> some way. Is this cosmetic? >>> >> >> OpenCL compiles a suitable (different) kernel for each and you do not >> have to configure anything. >> > > What's giving me pause is that without changing anything on either system, > descrypt-opencl is appropriately using sm_20 and sm_50 on my heterogeneous > system, but is only using sm_20 on my GTX750 system. Previously, the > latter system was happily using sm_52. I am not sure what changed. > > >> You can configure CUDA for compiling several archs at once, see "nvcc >> --help". It something like "-gencode arch=compute_20,code=sm_20 -gencode >> arch=compute_50,code=sm_50" (added to NVCC_FLAGS instead of just -arch >> sm_xx). The one most suitable of them will be picked at runtime. > > > Interesting -- I'll try that. > > Royce > Hi Royce, magnum, If you are interested, you can test the new revision of descrypt-opencl on 970, 980 and 290X. There are three kernels and you can select them by changing the parameters HARDCODE_SALT and FULL_UNROLL in opencl_DES_hst_dev_shared.h. Setting (1,1) gives you the fastest kernel but takes very long to compile, however subsequent runs should compile much quicker as pre-compiled kernels(saved to the disk from the prior runs) are used. Setting (1,0) gives slower speed but faster compilation time. Setting (0,0) is the slowest but compilation is quickest. Also do not fork on same system when HARDCODE_SALT is 1. Regards, Sayantan Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.