Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 7 Jul 2012 13:31:06 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: optimized mscash2-opencl

Sayantan, magnum -

I was puzzled by the fact that changing the "manual" rotates to rotate()
in pbkdf2_kernel.cl made it twice slower on HD 7970 (at least).  Today I
looked into this.  It turns out that Sayantan's version of the code
heavily relied on the compiler doing some non-trivial optimizations,
including figuring out that two of the four SHA-1 computations could be
moved out of the 10k-iterations loop.  Somehow the attempted change to
use rotate() was just enough to prevent that specific optimization.

Anyway, I've optimized the code to avoid relying on the compiler doing
this, and I made several other optimizations as well.  In the current
pbkdf2_kernel.cl the uses of rotate() and bitselect() no longer result
in any slowdown; however, they still don't result in any speedup as
well, which is puzzling.  Was the compiler good enough to generate the
proper instructions anyway? or does it still not do that?  We need to
examine the code to find out - at least IL if not native.  Sayantan -
this is now a task for you.

Before optimizations:

OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Optimal Work Group Size:256
Kernel Execution Speed (Higher is better):1.403122
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw:    92304 c/s real, 92467 c/s virtual

OpenCL platform 0: NVIDIA CUDA, 1 device(s).
Using device 0: GeForce GTX 570
Optimal Work Group Size:512
Kernel Execution Speed (Higher is better):0.416847
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw:    26900 c/s real, 26900 c/s virtual

After:

OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Optimal Work Group Size:256
Kernel Execution Speed (Higher is better):1.492774
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw:    97814 c/s real, 97632 c/s virtual

OpenCL platform 0: NVIDIA CUDA, 1 device(s).
Using device 0: GeForce GTX 570
Optimal Work Group Size:128
Kernel Execution Speed (Higher is better):0.491235
Benchmarking: M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL]... DONE
Raw:    31852 c/s real, 31813 c/s virtual

This is +6% on AMD and +18% on NVIDIA.

Actual run:

$ ./john -i=alpha ~/john/contest-2011/hashes-all.txt-1.mscash2 -fo=mscash2-opencl -pla=1
OpenCL platform 1: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Tahiti
Optimal Work Group Size:128
Kernel Execution Speed (Higher is better):1.492764
Loaded 1152 password hashes with 1090 different salts (M$ Cache Hash 2 (DCC2) PBKDF2-HMAC-SHA-1 [OpenCL])
guesses: 0  time: 0:00:00:12 0.00%  c/s: 21260  trying: bara - choedia
guesses: 0  time: 0:00:00:15 0.00%  c/s: 52393  trying: bara - choedia
salart           (gemignani)
guesses: 1  time: 0:00:00:17 0.00%  c/s: 59208  trying: bara - choedia
starter          (bevilaqua)
guesses: 2  time: 0:00:00:23 0.00%  c/s: 68177  trying: bara - choedia
guesses: 2  time: 0:00:01:43 0.00%  c/s: 96292  trying: bara - choedia
moones           (alexino)
guesses: 3  time: 0:00:02:02 0.00%  c/s: 96526  trying: bara - choedia
guesses: 3  time: 0:00:03:14 0.00%  c/s: 99710  trying: bara - choedia
assica           (bersamina)
mingui           (abisheva)
guesses: 5  time: 0:00:03:54 0.00%  c/s: 101588  trying: bara - choedia
annico           (boediman)
guesses: 6  time: 0:00:04:21 0.00%  c/s: 101194  trying: bara - choedia
stephat          (bamigboye)
guesses: 7  time: 0:00:04:56 0.00%  c/s: 100800  trying: bara - choedia
storine          (arient)
guesses: 8  time: 0:00:05:39 0.00%  c/s: 100352  trying: bara - choedia
aritta           (chamieh)
streles          (aquinde)
monies           (bercasio)
merrate          (figuera)
meless           (fiander)
starine          (clavier)
stomara          (elhadidi)
stronie          (elizan)
shoria           (daveii)
mistom           (bhuriwale)
alamel           (deblasis)
ashame           (bareis)
arandy           (ghazalie)
samali           (baubie)
stronia          (binduhewa)
metale           (bazier)
mereko           (aleksi)
guesses: 25  time: 0:00:12:01 0.00%  c/s: 101062  trying: bara - choedia
stramos          (empabido)
artico           (fallangie)
ashona           (estacion)
arishi           (elvina)
sherie           (dilawer)
andrin           (alawieh)
guesses: 31  time: 0:00:17:06 0.00%  c/s: 101869  trying: bara - choedia
artie            (heilemann)
merens           (heinzmann)
standan          (gilead)
artal            (adrienne)
anness           (beccaria)
guesses: 36  time: 0:00:18:32 0.00%  c/s: 101776  trying: bara - choedia
shomos           (basie)
mandia           (artillery)
annane           (azizieh)
guesses: 39  time: 0:00:19:23 0.00%  c/s: 101600  trying: bara - choedia
stepine          (hemmati)
guesses: 40  time: 0:00:20:27 0.00%  c/s: 101404  trying: bara - choedia
sarone           (bangie)
ashoon           (abhulimen)
storten          (akinremi)
misamo           (gravelin)
guesses: 44  time: 0:00:21:03 0.00%  c/s: 101483  trying: bara - choedia
stepand          (egnario)
guesses: 45  time: 0:00:21:48 0.00%  c/s: 101356  trying: bara - choedia

Default Adapter - AMD Radeon HD 7900 Series
                  Sensor 0: Temperature - 86.00 C

Default Adapter - AMD Radeon HD 7900 Series
                            Core (MHz)    Memory (MHz)
           Current Clocks :    925           1375
             Current Peak :    925           1375
  Configurable Peak Range : [300-1125]     [150-1575]
                 GPU load :    98%

Alexander

View attachment "john-mscash2-opencl-opt.diff" of type "text/plain" (8338 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.