Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 24 Apr 2012 11:04:22 -0300
From: Claudio André <claudioandre.br@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: New RAR OpenCL kernel - [1]

Since, it is complaining about the message size, i'll break it.
-----------------

Hi, see atached files. Please, try to see that 2560 seems to be a "magic 
number".

- TXT: raw results (no profiler)
- The same CSV file.
- And some more summary information.

Profiler using:
Local worksize (LWS) 256, Global worksize (KPC) 2560

----
   src/opencl/rar_kernel.cl |   34 ++++++++------
   src/rar_fmt.c            |  116 
++++++++++++++++++++++++++++++++++++++++-----
   2 files changed, 122 insertions(+), 28 deletions(-)
----



Em 22-04-2012 22:07, magnum escreveu:
> On 04/23/2012 12:02 AM, Claudio André wrote:
>>> Would both these figures by closer to 100 in a dream scenario, or what?
>>>
>>> By the way my previous version of rar got an "occupancy" of 0.01 or so
>>> (lol) in nvidia profiler. We'll see if there is any change now.
>>>
>>> magnum
>>>
>> I like the "dream scenario". Valid explanation. And 100 is the target.
>>
>> Alu packing has a ">  70" expectation.
>> Alubusy is where 100% is optimal.
>>
>> I agree that sprofile is not very useful, but is better than nothing (or
>> simple guessing). Since you have NVIDIA tools, it is not that important.
> I think sprofile is useful, it's just that my laptop GPU is so weak I
> can't draw any conclusions.
>
> Your profiling info was with LWS=GWS. Please try this if you have the time:
>
> 1. Pull latest git
> 2. Run with KPC=0 (I expect it to pick 4096 or higher as best)
> 3. Do another profiling run with the best KPC
>
> The ALU figures (and speed) should go up a lot (I hope). If they are
> not, the profiling info should tell why.
>
> thanks,
> magnum
>


# ProfilerVersion=2.4.1314
# Application=/home/claudio/bin/john/to_commit/run/john
# ApplicationArgs=--format=rar -t
# Device AMD Phenom(tm) II X6 1075T Processor PlatformVendor=Advanced Micro Devices, Inc.
# Device AMD Phenom(tm) II X6 1075T Processor PlatformName=AMD Accelerated Parallel Processing
# Device AMD Phenom(tm) II X6 1075T Processor PlatformVersion=OpenCL 1.1 AMD-APP (898.1)
# Device AMD Phenom(tm) II X6 1075T Processor CLDriverVersion=2.0
# Device AMD Phenom(tm) II X6 1075T Processor CLRuntimeVersion=OpenCL 1.1 AMD-APP (898.1)
# Device AMD Phenom(tm) II X6 1075T Processor NumberAppAddressBits=64
# Device Juniper PlatformVendor=Advanced Micro Devices, Inc.
# Device Juniper PlatformName=AMD Accelerated Parallel Processing
# Device Juniper PlatformVersion=OpenCL 1.1 AMD-APP (898.1)
# Device Juniper CLDriverVersion=CAL 1.4.1703
# Device Juniper CLRuntimeVersion=OpenCL 1.1 AMD-APP (898.1)
# Device Juniper NumberAppAddressBits=32
# OS=Ubuntu 11.10 \n \l
Method , ExecutionOrder , ThreadID , CallIndex , GlobalWorkSize , WorkGroupSize , Time , LocalMemSize , VGPRs , SGPRs , ScratchRegs , FCStacks , Wavefronts , ALUInsts , FetchInsts , WriteInsts , LDSFetchInsts , LDSWriteInsts , ALUBusy , ALUFetchRatio , ALUPacking , FetchSize , CacheHit , FetchUnitBusy , FetchUnitStalled , WriteUnitStalled , FastPath , CompletePath , PathUtilization , LDSBankConflict
SetCryptKeys__k1_Juniper1 ,     1 , 4177 , 44 , {   2560       1       1} , {  256     1     1} ,     10201.39244 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  59930330.40 ,   2585736.85 ,   4857573.10 ,         0.00 ,         0.00 ,        11.12 ,        23.18 ,        36.22 ,  78580552.81 ,         0.00 ,         1.89 ,         0.00 ,         1.51 , 143511374.62 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     2 , 4177 , 50 , {   2560       1       1} , {  256     1     1} ,     10133.35411 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  61541514.60 ,   2633251.77 ,   5024695.65 ,         0.00 ,         0.00 ,        11.38 ,        23.37 ,        36.26 ,  78697805.38 ,         0.00 ,         1.90 ,         0.01 ,         2.73 , 142107275.12 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     3 , 4177 , 56 , {   2560       1       1} , {  256     1     1} ,     10191.10222 ,           0 ,    46 , NA ,    18 ,     5 ,     46346.00 ,     53117.05 ,      2273.69 ,      4337.68 ,         0.00 ,         0.00 ,        11.37 ,        23.36 ,        36.27 ,  78709290.44 ,         0.00 ,         1.90 ,         0.01 ,         1.84 , 142775293.12 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     4 , 4177 , 62 , {   2560       1       1} , {  256     1     1} ,     10137.70055 ,           0 ,    46 , NA ,    18 ,     5 ,     34338.00 ,     71693.14 ,      3068.45 ,      5854.22 ,         0.00 ,         0.00 ,        11.42 ,        23.36 ,        36.27 ,  78706366.00 ,         0.01 ,         1.90 ,         0.00 ,         1.85 , 142506350.12 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     5 , 4177 , 68 , {   2560       1       1} , {  256     1     1} ,     10163.67856 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  63152698.80 ,   2680766.70 ,   5191818.20 ,         0.00 ,         0.00 ,        11.69 ,        23.56 ,        36.30 ,  78815057.94 ,         0.00 ,         1.91 ,         0.00 ,         2.08 , 141978299.88 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     6 , 4177 , 74 , {   2560       1       1} , {  256     1     1} ,     10180.62056 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  64763883.00 ,   2728281.62 ,   5358940.75 ,         0.00 ,         0.00 ,        11.97 ,        23.74 ,        36.34 ,  78932310.50 ,         0.00 ,         1.91 ,         0.00 ,         3.22 , 140864913.00 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     7 , 4177 , 80 , {   2560       1       1} , {  256     1     1} ,     10193.48256 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  64763883.00 ,   2728281.62 ,   5358940.75 ,         0.00 ,         0.00 ,        12.00 ,        23.74 ,        36.34 ,  78932310.50 ,         0.00 ,         1.92 ,         0.00 ,         2.54 , 140891224.38 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     8 , 4177 , 86 , {   2560       1       1} , {  256     1     1} ,     10228.44867 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  66375067.20 ,   2775796.55 ,   5526063.30 ,         0.00 ,         0.00 ,        12.23 ,        23.91 ,        36.38 ,  79049563.06 ,         0.00 ,         1.91 ,         0.00 ,         3.08 , 140955757.50 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     9 , 4177 , 92 , {   2560       1       1} , {  256     1     1} ,     10188.36500 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  66375067.20 ,   2775796.55 ,   5526063.30 ,         0.00 ,         0.00 ,        12.26 ,        23.91 ,        36.38 ,  79049563.06 ,         0.00 ,         1.92 ,         0.00 ,         2.65 , 140508586.38 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,    10 , 4177 , 98 , {   2560       1       1} , {  256     1     1} ,      8757.71067 ,           0 ,    46 , NA ,    18 ,     5 ,     34533.00 ,    116852.39 ,      4632.03 ,      9719.20 ,         0.00 ,         0.00 ,        21.70 ,        25.23 ,        37.15 , 114705031.75 ,         0.00 ,         3.44 ,         0.01 ,         0.37 , 277488663.00 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,    11 , 4177 , 102 , {   2560       1       1} , {  256     1     1} ,      8749.67500 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 , 100879334.00 ,   3998088.00 ,   8389963.00 ,         0.00 ,         0.00 ,        21.68 ,        25.23 ,        37.15 , 114696490.00 ,         0.00 ,         3.44 ,         0.01 ,         0.37 , 277411409.88 ,         0.00 ,       100.00 ,         0.00


OpenCL platform 0: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Juniper
Compilation log: LOOP UNROLL: pragma unroll (line 259)
    Unrolled as requested!

Warning: SetCryptKeys kernel has register spilling. Lower performance is expected.

Calculating best keys per crypt for LWS 32, this will take a while 
kpc   32	   4 c/s        376852 sha1/s   7.758 sec per crypt_all()+
kpc   64	   8 c/s        753704 sha1/s   7.814 sec per crypt_all()+
kpc   96	  12 c/s       1130556 sha1/s   7.841 sec per crypt_all()+
kpc  128	  16 c/s       1507408 sha1/s   7.834 sec per crypt_all()+
kpc  160	  20 c/s       1884260 sha1/s   7.882 sec per crypt_all()+
kpc  192	  24 c/s       2261112 sha1/s   7.890 sec per crypt_all()+
kpc  224	  28 c/s       2637964 sha1/s   7.924 sec per crypt_all()+
kpc  256	  32 c/s       3014816 sha1/s   7.895 sec per crypt_all()+
kpc  288	  36 c/s       3391668 sha1/s   7.912 sec per crypt_all()+
kpc  320	  40 c/s       3768520 sha1/s   7.948 sec per crypt_all()+
kpc  352	  44 c/s       4145372 sha1/s   7.917 sec per crypt_all()+
kpc  384	  48 c/s       4522224 sha1/s   7.922 sec per crypt_all()+
kpc  416	  52 c/s       4899076 sha1/s   7.953 sec per crypt_all()+
kpc  448	  56 c/s       5275928 sha1/s   7.973 sec per crypt_all()+
kpc  480	  60 c/s       5652780 sha1/s   7.954 sec per crypt_all()+
kpc  512	  64 c/s       6029632 sha1/s   7.946 sec per crypt_all()+
kpc  544	  68 c/s       6406484 sha1/s   7.986 sec per crypt_all()+
kpc  576	  72 c/s       6783336 sha1/s   7.973 sec per crypt_all()+
kpc  608	  75 c/s       7065975 sha1/s   8.008 sec per crypt_all()+
kpc  640	  79 c/s       7442827 sha1/s   8.039 sec per crypt_all()+
kpc  672	  83 c/s       7819679 sha1/s   8.015 sec per crypt_all()+
kpc  704	  87 c/s       8196531 sha1/s   8.046 sec per crypt_all()+
kpc  736	  91 c/s       8573383 sha1/s   8.036 sec per crypt_all()+
kpc  768	  95 c/s       8950235 sha1/s   8.048 sec per crypt_all()+
kpc  800	  99 c/s       9327087 sha1/s   8.052 sec per crypt_all()+
kpc  832	 103 c/s       9703939 sha1/s   8.057 sec per crypt_all()+
kpc  864	 106 c/s       9986578 sha1/s   8.086 sec per crypt_all()+
kpc  896	 110 c/s      10363430 sha1/s   8.094 sec per crypt_all()+
kpc  928	 114 c/s      10740282 sha1/s   8.112 sec per crypt_all()+
kpc  960	 117 c/s      11022921 sha1/s   8.138 sec per crypt_all()+
kpc  992	 121 c/s      11399773 sha1/s   8.156 sec per crypt_all()+
kpc 1024	 125 c/s      11776625 sha1/s   8.190 sec per crypt_all()+
kpc 1056	 128 c/s      12059264 sha1/s   8.210 sec per crypt_all()+
kpc 1088	 131 c/s      12341903 sha1/s   8.261 sec per crypt_all()+
kpc 1120	 134 c/s      12624542 sha1/s   8.326 sec per crypt_all()+
kpc 1152	 137 c/s      12907181 sha1/s   8.361 sec per crypt_all()+
kpc 1184	 141 c/s      13284033 sha1/s   8.387 sec per crypt_all()+
kpc 1216	 144 c/s      13566672 sha1/s   8.434 sec per crypt_all()+
kpc 1248	 147 c/s      13849311 sha1/s   8.477 sec per crypt_all()+
kpc 1280	 150 c/s      14131950 sha1/s   8.518 sec per crypt_all()+
kpc 1312	 152 c/s      14320376 sha1/s   8.597 sec per crypt_all()+
kpc 1344	 150 c/s      14131950 sha1/s   8.939 sec per crypt_all()
kpc 1376	 144 c/s      13566672 sha1/s   9.508 sec per crypt_all()
kpc 1408	 144 c/s      13566672 sha1/s   9.737 sec per crypt_all()
kpc 1440	 144 c/s      13566672 sha1/s   9.984 sec per crypt_all()
kpc 1472	 142 c/s      13378246 sha1/s  10.319 sec per crypt_all()
Optimal keys per crypt 1312
(to store this, put "rar_KPC = 1312" in john.conf, section [Options:OpenCL])
Local worksize (LWS) 32, Global worksize (KPC) 1312
Benchmarking: RAR3 (6 characters) [OpenCL]... DONE
Raw:	153 c/s real, 87466 c/s virtual


OpenCL platform 0: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Juniper
Compilation log: LOOP UNROLL: pragma unroll (line 259)
    Unrolled as requested!

Warning: SetCryptKeys kernel has register spilling. Lower performance is expected.

LWS 512 is too large for this GPU. Max allowed is 256, using that.
Calculating best keys per crypt for LWS 256, this will take a while 
kpc  256	  32 c/s       3014816 sha1/s   7.883 sec per crypt_all()+
kpc  512	  65 c/s       6123845 sha1/s   7.855 sec per crypt_all()+
kpc  768	  96 c/s       9044448 sha1/s   7.945 sec per crypt_all()+
kpc 1024	 128 c/s      12059264 sha1/s   7.995 sec per crypt_all()+
kpc 1280	 159 c/s      14979867 sha1/s   8.040 sec per crypt_all()+
kpc 1536	 190 c/s      17900470 sha1/s   8.068 sec per crypt_all()+
kpc 1792	 219 c/s      20632647 sha1/s   8.172 sec per crypt_all()+
kpc 2048	 247 c/s      23270611 sha1/s   8.283 sec per crypt_all()+
kpc 2304	 270 c/s      25437510 sha1/s   8.521 sec per crypt_all()+
kpc 2560	 292 c/s      27510196 sha1/s   8.759 sec per crypt_all()+
kpc 2816	 168 c/s      15827784 sha1/s  16.696 sec per crypt_all()
kpc 3072	 183 c/s      17240979 sha1/s  16.730 sec per crypt_all()
kpc 3328	 199 c/s      18748387 sha1/s  16.713 sec per crypt_all()
kpc 3584	 213 c/s      20067369 sha1/s  16.766 sec per crypt_all()
kpc 3840	 228 c/s      21480564 sha1/s  16.822 sec per crypt_all()
kpc 4096	 242 c/s      22799546 sha1/s  16.858 sec per crypt_all()
kpc 4352	 256 c/s      24118528 sha1/s  16.950 sec per crypt_all()
kpc 4608	 270 c/s      25437510 sha1/s  17.046 sec per crypt_all()
kpc 4864	 281 c/s      26473853 sha1/s  17.253 sec per crypt_all()
kpc 5120	 292 c/s      27510196 sha1/s  17.488 sec per crypt_all()+
kpc 5376	 211 c/s      19878943 sha1/s  25.469 sec per crypt_all()
kpc 5632	 220 c/s      20726860 sha1/s  25.544 sec per crypt_all()
kpc 5888	 230 c/s      21668990 sha1/s  25.545 sec per crypt_all()
kpc 6144	 240 c/s      22611120 sha1/s  25.567 sec per crypt_all()
kpc 6400	 250 c/s      23553250 sha1/s  25.588 sec per crypt_all()
kpc 6656	 259 c/s      24401167 sha1/s  25.658 sec per crypt_all()
kpc 6912	 268 c/s      25249084 sha1/s  25.708 sec per crypt_all()
kpc 7168	 277 c/s      26097001 sha1/s  25.817 sec per crypt_all()
kpc 7424	 285 c/s      26850705 sha1/s  26.008 sec per crypt_all()
kpc 7680	 292 c/s      27510196 sha1/s  26.212 sec per crypt_all()+
kpc 7936	 231 c/s      21763203 sha1/s  34.213 sec per crypt_all()
kpc 8192	 238 c/s      22422694 sha1/s  34.325 sec per crypt_all()
kpc 8448	 246 c/s      23176398 sha1/s  34.302 sec per crypt_all()
Optimal keys per crypt 7680
(to store this, put "rar_KPC = 7680" in john.conf, section [Options:OpenCL])
Local worksize (LWS) 256, Global worksize (KPC) 7680
Benchmarking: RAR3 (6 characters) [OpenCL]... DONE
Raw:	291 c/s real, 153600 c/s virtual


claudio@...udioandre-desktop:~/bin/john/to_commit/src$ LWS=128 KPC=2560 ../run/john -test -fo:rar
OpenCL platform 0: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Juniper
Compilation log: LOOP UNROLL: pragma unroll (line 259)
    Unrolled as requested!

Warning: SetCryptKeys kernel has register spilling. Lower performance is expected.

Local worksize (LWS) 128, Global worksize (KPC) 2560
Benchmarking: RAR3 (6 characters) [OpenCL]... DONE
Raw:	291 c/s real, 128000 c/s virtual
-----------

claudio@...udioandre-desktop:~/bin/john/to_commit/src$ LWS=256 KPC=2560 ../run/john -test -fo:rar
OpenCL platform 0: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Juniper
Compilation log: LOOP UNROLL: pragma unroll (line 259)
    Unrolled as requested!

Warning: SetCryptKeys kernel has register spilling. Lower performance is expected.

Local worksize (LWS) 256, Global worksize (KPC) 2560
Benchmarking: RAR3 (6 characters) [OpenCL]... DONE
Raw:	291 c/s real, 170666 c/s virtual
-----------

claudio@...udioandre-desktop:~/bin/john/to_commit/src$ LWS=256 KPC=8192 ../run/john -test -fo:rar
OpenCL platform 0: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Juniper
Compilation log: LOOP UNROLL: pragma unroll (line 259)
    Unrolled as requested!

Warning: SetCryptKeys kernel has register spilling. Lower performance is expected.

Local worksize (LWS) 256, Global worksize (KPC) 8192
Benchmarking: RAR3 (6 characters) [OpenCL]... DONE
Raw:	238 c/s real, 126030 c/s virtual
-----------

claudio@...udioandre-desktop:~/bin/john/to_commit/src$ LWS=256 KPC=12800 ../run/john -test -fo:rar
OpenCL platform 0: AMD Accelerated Parallel Processing, 2 device(s).
Using device 0: Juniper
Compilation log: LOOP UNROLL: pragma unroll (line 259)
    Unrolled as requested!

Warning: SetCryptKeys kernel has register spilling. Lower performance is expected.

Local worksize (LWS) 256, Global worksize (KPC) 12800
Benchmarking: RAR3 (6 characters) [OpenCL]... DONE
Raw:	292 c/s real, 182857 c/s virtual



Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ