|
Date: Sat, 7 Jul 2012 17:14:52 +0530
From: Sayantan Datta <std2048@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: optimized mscash2-opencl
On Sat, Jul 7, 2012 at 3:01 PM, Solar Designer <solar@...nwall.com> wrote:
> Sayantan, magnum -
>
> I was puzzled by the fact that changing the "manual" rotates to rotate()
> in pbkdf2_kernel.cl made it twice slower on HD 7970 (at least). Today I
> looked into this. It turns out that Sayantan's version of the code
> heavily relied on the compiler doing some non-trivial optimizations,
> including figuring out that two of the four SHA-1 computations could be
> moved out of the 10k-iterations loop. Somehow the attempted change to
> use rotate() was just enough to prevent that specific optimization.
>
> Anyway, I've optimized the code to avoid relying on the compiler doing
> this, and I made several other optimizations as well. In the current
> pbkdf2_kernel.cl the uses of rotate() and bitselect() no longer result
> in any slowdown; however, they still don't result in any speedup as
> well, which is puzzling. Was the compiler good enough to generate the
> proper instructions anyway? or does it still not do that? We need to
> examine the code to find out - at least IL if not native. Sayantan -
> this is now a task for you.
>
> Before optimizations:
>
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Optimal Work Group Size:256
> Kernel Execution Speed (Higher is better):1.403122
> Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
> Raw: 92304 c/s real, 92467 c/s virtual
>
> OpenCL platform 0: NVIDIA CUDA, 1 device(s).
> Using device 0: GeForce GTX 570
> Optimal Work Group Size:512
> Kernel Execution Speed (Higher is better):0.416847
> Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
> Raw: 26900 c/s real, 26900 c/s virtual
>
> After:
>
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Optimal Work Group Size:256
> Kernel Execution Speed (Higher is better):1.492774
> Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
> Raw: 97814 c/s real, 97632 c/s virtual
>
> OpenCL platform 0: NVIDIA CUDA, 1 device(s).
> Using device 0: GeForce GTX 570
> Optimal Work Group Size:128
> Kernel Execution Speed (Higher is better):0.491235
> Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
> Raw: 31852 c/s real, 31813 c/s virtual
>
> This is +6% on AMD and +18% on NVIDIA.
>
> Actual run:
>
> $ ./john -i=alpha ~/john/contest-2011/hashes-all.txt-1.mscash2
> -fo=mscash2-opencl -pla=1
> OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
> Using device 0: Tahiti
> Optimal Work Group Size:128
> Kernel Execution Speed (Higher is better):1.492764
> Loaded 1152 password hashes with 1090 different salts (M$ Cache Hash 2
> (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL])
> guesses: 0 time: 0:00:00:12 0.00% c/s: 21260 trying: bara - choedia
> guesses: 0 time: 0:00:00:15 0.00% c/s: 52393 trying: bara - choedia
> salart (gemignani)
> guesses: 1 time: 0:00:00:17 0.00% c/s: 59208 trying: bara - choedia
> starter (bevilaqua)
> guesses: 2 time: 0:00:00:23 0.00% c/s: 68177 trying: bara - choedia
> guesses: 2 time: 0:00:01:43 0.00% c/s: 96292 trying: bara - choedia
> moones (alexino)
> guesses: 3 time: 0:00:02:02 0.00% c/s: 96526 trying: bara - choedia
> guesses: 3 time: 0:00:03:14 0.00% c/s: 99710 trying: bara - choedia
> assica (bersamina)
> mingui (abisheva)
> guesses: 5 time: 0:00:03:54 0.00% c/s: 101588 trying: bara - choedia
> annico (boediman)
> guesses: 6 time: 0:00:04:21 0.00% c/s: 101194 trying: bara - choedia
> stephat (bamigboye)
> guesses: 7 time: 0:00:04:56 0.00% c/s: 100800 trying: bara - choedia
> storine (arient)
> guesses: 8 time: 0:00:05:39 0.00% c/s: 100352 trying: bara - choedia
> aritta (chamieh)
> streles (aquinde)
> monies (bercasio)
> merrate (figuera)
> meless (fiander)
> starine (clavier)
> stomara (elhadidi)
> stronie (elizan)
> shoria (daveii)
> mistom (bhuriwale)
> alamel (deblasis)
> ashame (bareis)
> arandy (ghazalie)
> samali (baubie)
> stronia (binduhewa)
> metale (bazier)
> mereko (aleksi)
> guesses: 25 time: 0:00:12:01 0.00% c/s: 101062 trying: bara - choedia
> stramos (empabido)
> artico (fallangie)
> ashona (estacion)
> arishi (elvina)
> sherie (dilawer)
> andrin (alawieh)
> guesses: 31 time: 0:00:17:06 0.00% c/s: 101869 trying: bara - choedia
> artie (heilemann)
> merens (heinzmann)
> standan (gilead)
> artal (adrienne)
> anness (beccaria)
> guesses: 36 time: 0:00:18:32 0.00% c/s: 101776 trying: bara - choedia
> shomos (basie)
> mandia (artillery)
> annane (azizieh)
> guesses: 39 time: 0:00:19:23 0.00% c/s: 101600 trying: bara - choedia
> stepine (hemmati)
> guesses: 40 time: 0:00:20:27 0.00% c/s: 101404 trying: bara - choedia
> sarone (bangie)
> ashoon (abhulimen)
> storten (akinremi)
> misamo (gravelin)
> guesses: 44 time: 0:00:21:03 0.00% c/s: 101483 trying: bara - choedia
> stepand (egnario)
> guesses: 45 time: 0:00:21:48 0.00% c/s: 101356 trying: bara - choedia
>
> Default Adapter - AMD Radeon HD 7900 Series
> Sensor 0: Temperature - 86.00 C
>
> Default Adapter - AMD Radeon HD 7900 Series
> Core (MHz) Memory (MHz)
> Current Clocks : 925 1375
> Current Peak : 925 1375
> Configurable Peak Range : [300-1125] [150-1575]
> GPU load : 98%
>
> Alexander
>
Guess I didn't had much deeper insight into the codes which prevented me
from moving the two SHA1 from the 10K loops. BTW I was expecting much more
performace, nearly double on 7970 becuse the two SHA1 represented almost
half of the total computation. This could mean we are using a lot of
global memory which I will check too. With this patch we are at par with
hashcat on 570 but stll lagging behind on 7970. Regarding bitselct and
roatae it didn't even work properly when I applied them first time. So I
reverted them.Anyway I'll look into the bitselect and rotate case.
Regards,
Sayantan.
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.