Date: Thu, 19 Sep 2013 20:02:24 +0200 From: magnum <john.magnum@...hmail.com> To: john-dev@...ts.openwall.com Subject: Re: CUDA On 15 sep 2013, at 03:48, Solar Designer <solar@...nwall.com> wrote: > I was experimenting with the peak GFLOPS benchmarks found at: > > http://olab.is.s.u-tokyo.ac.jp/~kamil.rocki/projects.html > > and noticed that FlopsCUDA_src_linux.zip builds its CUDA source for > multiple archs at once, by using these nvcc options: "-gencode > arch=compute_20,code=sm_21 -gencode arch=compute_30,code=sm_30". > It also has "-fmad=true". Maybe this is how we should be building our > CUDA stuff, too. We should probably also add "-gencode arch=compute_10,code=sm_10" for really old cards (which is what have been defaulting to so far). I now tried this on my (sm_30) CUDA 5 Macbook. CUDA compilation obviously takes longer as it builds three different versions of each kernel. Unfortunately, and as seen in tests long ago (just changing from sm_10 to sm_20/30) some formats actually get much slower from this. -fmad=true by itself doesn't seem to make any difference. sm_30 doesn't seem to make any good at all. Some formats get a significant regression. No format get a significant boost. sm_20 makes all formats faster except md5crypt, which gets a 12% regression. Curiously, phpass (which is very similar, right?) get a 21% speedup. Maybe we should settle for sm_20 and try to optimize md5crypt at that setting? Other significant boosts are pwsafe (+65%), mscash2 (+58%), wpapsk (+40%) and sha256crypt (+15%). ...OK, I'll replace "-arch sm_10" with "-fmad=true -gencode arch=compute_10,code=sm_10 -gencode arch=compute_20,code=sm_21" right now, and place "-gencode arch=compute_30,code=sm_30" in comments for the time being. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.