Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 19 Sep 2013 20:02:24 +0200
From: magnum <>
Subject: Re: CUDA

On 15 sep 2013, at 03:48, Solar Designer <> wrote:
> I was experimenting with the peak GFLOPS benchmarks found at:
> and noticed that builds its CUDA source for
> multiple archs at once, by using these nvcc options: "-gencode
> arch=compute_20,code=sm_21 -gencode arch=compute_30,code=sm_30".
> It also has "-fmad=true".  Maybe this is how we should be building our
> CUDA stuff, too.

We should probably also add "-gencode arch=compute_10,code=sm_10" for really old cards (which is what have been defaulting to so far).

I now tried this on my (sm_30) CUDA 5 Macbook. CUDA compilation obviously takes longer as it builds three different versions of each kernel. Unfortunately, and as seen in tests long ago (just changing from sm_10 to sm_20/30) some formats actually get much slower from this.

-fmad=true by itself doesn't seem to make any difference.

sm_30 doesn't seem to make any good at all. Some formats get a significant regression. No format get a significant boost.

sm_20 makes all formats faster except md5crypt, which gets a 12% regression. Curiously, phpass (which is very similar, right?) get a 21% speedup. Maybe we should settle for sm_20 and try to optimize md5crypt at that setting? Other significant boosts are pwsafe (+65%), mscash2 (+58%), wpapsk (+40%) and sha256crypt (+15%).

...OK, I'll replace "-arch sm_10" with "-fmad=true -gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_21" right now, and place "-gencode arch=compute_30,code=sm_30" in comments for the time being.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.