Date: Sat, 6 Oct 2012 16:15:23 +0200 From: magnum <john.magnum@...hmail.com> To: "john-users@...ts.openwall.com" <john-users@...ts.openwall.com> Subject: CUDA tweaking to your actual GPU When compiling JtR with CUDA support, you are supposed to change the NVCC_FLAGS in Makefile to reflect your GPU card. Here's what happens if you do, or don't. This was tested with a Kepler card, which is "sm_30" (ie. capability 3.0) while the default in Makefile is "sm_10" (ie. capability, well you guessed it, 1.0). The latter will always work, but might not be optimal. I ran all CUDA benchmarks with sm_10 and sm_30, and here's relbench's verdict: $ ../run/relbench -v sm10.txt sm30.txt Ratio: 1.07261 real, 1.06185 virtual Raw SHA-224:Raw Ratio: 0.99100 real, 0.99100 virtual M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1:Raw Ratio: 1.10947 real, 1.10947 virtual Password Safe SHA-256:Raw Ratio: 1.04536 real, 1.04625 virtual sha256crypt (rounds=5000):Raw Ratio: 0.89869 real, 0.88997 virtual phpass MD5 ($P$9 lengths 0 to 15):Raw Ratio: 1.02549 real, 1.02549 virtual WPA-PSK PBKDF2-HMAC-SHA-1:Raw Ratio: 0.79586 real, 0.78033 virtual Raw SHA-512:Raw Ratio: 1.34998 real, 1.36361 virtual M$ Cache Hash MD4:Only one salt Ratio: 0.88889 real, 0.88081 virtual md5crypt:Raw Ratio: 2.73102 real, 2.75858 virtual M$ Cache Hash MD4:Many salts Ratio: 0.81805 real, 0.81003 virtual Mac OS X 10.7+ salted SHA-512:Many salts Ratio: 1.07281 real, 1.08331 virtual Raw SHA-256:Raw Ratio: 1.10343 real, 1.09753 virtual sha512crypt (rounds=5000):Raw Ratio: 0.80827 real, 0.80827 virtual Mac OS X 10.7+ salted SHA-512:Only one salt Number of benchmarks: 14 Minimum: 0.79586 real, 0.78033 virtual Maximum: 2.73102 real, 2.75858 virtual Median: 1.03538 real, 1.03582 virtual Median absolute deviation: 0.10064 real, 0.10365 virtual Geometric mean: 1.06194 real, 1.05942 virtual Geometric standard deviation: 1.34743 real, 1.35503 virtual The worst slow-down was 20%, for raw-sha512. That's a pity, but the best boost was a whopping 2.7x speedup, for mscash. Both these formats are fast and currently suboptimal for JtR, and might have a large portion of random test variation. I'm not sure what to do with this, but here are some observations: - This test should also include tweaking BLOCKS and THREADS but everything was left at default. Some auto-homing like we do in OpenCL should be fairly high priority for CUDA. The outcome *may* be totally different with optimal values. - The interesting formats are the slow ones. Phpass had an 11% slowdown, and md5crypt about the same. Mscash2 lost a percent, most others gained more or less. - Apparently, this is not super-important. You get up to 10% gain if we disregard mscash. - The OpenCL support for nvidia automatically picks your actual architecture so will likely have similar goods and bads. It's probably possible to pick an arch per format, if we want to. Final note for OSX users: CUDA 5 for OSX is released. All JtR formats work fine, and the old bug that gave me kernel panics if GPU was set to dynamic switching is gone. magnum
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.