Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 19 Sep 2013 20:02:24 +0200
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: CUDA

On 15 sep 2013, at 03:48, Solar Designer <solar@...nwall.com> wrote:
> I was experimenting with the peak GFLOPS benchmarks found at:
> 
> http://olab.is.s.u-tokyo.ac.jp/~kamil.rocki/projects.html
> 
> and noticed that FlopsCUDA_src_linux.zip builds its CUDA source for
> multiple archs at once, by using these nvcc options: "-gencode
> arch=compute_20,code=sm_21 -gencode arch=compute_30,code=sm_30".
> It also has "-fmad=true".  Maybe this is how we should be building our
> CUDA stuff, too.


We should probably also add "-gencode arch=compute_10,code=sm_10" for really old cards (which is what have been defaulting to so far).

I now tried this on my (sm_30) CUDA 5 Macbook. CUDA compilation obviously takes longer as it builds three different versions of each kernel. Unfortunately, and as seen in tests long ago (just changing from sm_10 to sm_20/30) some formats actually get much slower from this.

-fmad=true by itself doesn't seem to make any difference.

sm_30 doesn't seem to make any good at all. Some formats get a significant regression. No format get a significant boost.

sm_20 makes all formats faster except md5crypt, which gets a 12% regression. Curiously, phpass (which is very similar, right?) get a 21% speedup. Maybe we should settle for sm_20 and try to optimize md5crypt at that setting? Other significant boosts are pwsafe (+65%), mscash2 (+58%), wpapsk (+40%) and sha256crypt (+15%).

...OK, I'll replace "-arch sm_10" with "-fmad=true -gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_21" right now, and place "-gencode arch=compute_30,code=sm_30" in comments for the time being.

magnum

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ