![]() |
|
Message-ID: <20120707093106.GA26343@openwall.com>
Date: Sat, 7 Jul 2012 13:31:06 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: optimized mscash2-opencl
Sayantan, magnum -
I was puzzled by the fact that changing the "manual" rotates to rotate()
in pbkdf2_kernel.cl made it twice slower on HD 7970 (at least). Today I
looked into this. It turns out that Sayantan's version of the code
heavily relied on the compiler doing some non-trivial optimizations,
including figuring out that two of the four SHA-1 computations could be
moved out of the 10k-iterations loop. Somehow the attempted change to
use rotate() was just enough to prevent that specific optimization.
Anyway, I've optimized the code to avoid relying on the compiler doing
this, and I made several other optimizations as well. In the current
pbkdf2_kernel.cl the uses of rotate() and bitselect() no longer result
in any slowdown; however, they still don't result in any speedup as
well, which is puzzling. Was the compiler good enough to generate the
proper instructions anyway? or does it still not do that? We need to
examine the code to find out - at least IL if not native. Sayantan -
this is now a task for you.
Before optimizations:
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Optimal Work Group Size:256
Kernel Execution Speed (Higher is better):1.403122
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw: 92304 c/s real, 92467 c/s virtual
OpenCL platform 0: NVIDIA CUDA, 1 device(s).
Using device 0: GeForce GTX 570
Optimal Work Group Size:512
Kernel Execution Speed (Higher is better):0.416847
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw: 26900 c/s real, 26900 c/s virtual
After:
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Optimal Work Group Size:256
Kernel Execution Speed (Higher is better):1.492774
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw: 97814 c/s real, 97632 c/s virtual
OpenCL platform 0: NVIDIA CUDA, 1 device(s).
Using device 0: GeForce GTX 570
Optimal Work Group Size:128
Kernel Execution Speed (Higher is better):0.491235
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw: 31852 c/s real, 31813 c/s virtual
This is +6% on AMD and +18% on NVIDIA.
Actual run:
$ ./john -i=alpha ~/john/contest-2011/hashes-all.txt-1.mscash2 -fo=mscash2-opencl -pla=1
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Optimal Work Group Size:128
Kernel Execution Speed (Higher is better):1.492764
Loaded 1152 password hashes with 1090 different salts (M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL])
guesses: 0 time: 0:00:00:12 0.00% c/s: 21260 trying: bara - choedia
guesses: 0 time: 0:00:00:15 0.00% c/s: 52393 trying: bara - choedia
salart (gemignani)
guesses: 1 time: 0:00:00:17 0.00% c/s: 59208 trying: bara - choedia
starter (bevilaqua)
guesses: 2 time: 0:00:00:23 0.00% c/s: 68177 trying: bara - choedia
guesses: 2 time: 0:00:01:43 0.00% c/s: 96292 trying: bara - choedia
moones (alexino)
guesses: 3 time: 0:00:02:02 0.00% c/s: 96526 trying: bara - choedia
guesses: 3 time: 0:00:03:14 0.00% c/s: 99710 trying: bara - choedia
assica (bersamina)
mingui (abisheva)
guesses: 5 time: 0:00:03:54 0.00% c/s: 101588 trying: bara - choedia
annico (boediman)
guesses: 6 time: 0:00:04:21 0.00% c/s: 101194 trying: bara - choedia
stephat (bamigboye)
guesses: 7 time: 0:00:04:56 0.00% c/s: 100800 trying: bara - choedia
storine (arient)
guesses: 8 time: 0:00:05:39 0.00% c/s: 100352 trying: bara - choedia
aritta (chamieh)
streles (aquinde)
monies (bercasio)
merrate (figuera)
meless (fiander)
starine (clavier)
stomara (elhadidi)
stronie (elizan)
shoria (daveii)
mistom (bhuriwale)
alamel (deblasis)
ashame (bareis)
arandy (ghazalie)
samali (baubie)
stronia (binduhewa)
metale (bazier)
mereko (aleksi)
guesses: 25 time: 0:00:12:01 0.00% c/s: 101062 trying: bara - choedia
stramos (empabido)
artico (fallangie)
ashona (estacion)
arishi (elvina)
sherie (dilawer)
andrin (alawieh)
guesses: 31 time: 0:00:17:06 0.00% c/s: 101869 trying: bara - choedia
artie (heilemann)
merens (heinzmann)
standan (gilead)
artal (adrienne)
anness (beccaria)
guesses: 36 time: 0:00:18:32 0.00% c/s: 101776 trying: bara - choedia
shomos (basie)
mandia (artillery)
annane (azizieh)
guesses: 39 time: 0:00:19:23 0.00% c/s: 101600 trying: bara - choedia
stepine (hemmati)
guesses: 40 time: 0:00:20:27 0.00% c/s: 101404 trying: bara - choedia
sarone (bangie)
ashoon (abhulimen)
storten (akinremi)
misamo (gravelin)
guesses: 44 time: 0:00:21:03 0.00% c/s: 101483 trying: bara - choedia
stepand (egnario)
guesses: 45 time: 0:00:21:48 0.00% c/s: 101356 trying: bara - choedia
Default Adapter - AMD Radeon HD 7900 Series
Sensor 0: Temperature - 86.00 C
Default Adapter - AMD Radeon HD 7900 Series
Core (MHz) Memory (MHz)
Current Clocks : 925 1375
Current Peak : 925 1375
Configurable Peak Range : [300-1125] [150-1575]
GPU load : 98%
Alexander
View attachment "john-mscash2-opencl-opt.diff" of type "text/plain" (8338 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.