Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 24 Apr 2012 11:07:26 -0300
From: Claudio André <claudioandre.br@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: New RAR OpenCL kernel - [3]

Again
----------

Hi, see atached files. Please, try to see that 2560 seems to be a "magic 
number".

- TXT: raw results (no profiler)
- The same CSV file.
- And some more summary information.

Profiler using:
Local worksize (LWS) 256, Global worksize (KPC) 2560

----
   src/opencl/rar_kernel.cl |   34 ++++++++------
   src/rar_fmt.c            |  116 
++++++++++++++++++++++++++++++++++++++++-----
   2 files changed, 122 insertions(+), 28 deletions(-)
----




Em 22-04-2012 22:07, magnum escreveu:
> On 04/23/2012 12:02 AM, Claudio André wrote:
>>> Would both these figures by closer to 100 in a dream scenario, or what?
>>>
>>> By the way my previous version of rar got an "occupancy" of 0.01 or so
>>> (lol) in nvidia profiler. We'll see if there is any change now.
>>>
>>> magnum
>>>
>> I like the "dream scenario". Valid explanation. And 100 is the target.
>>
>> Alu packing has a ">  70" expectation.
>> Alubusy is where 100% is optimal.
>>
>> I agree that sprofile is not very useful, but is better than nothing (or
>> simple guessing). Since you have NVIDIA tools, it is not that important.
> I think sprofile is useful, it's just that my laptop GPU is so weak I
> can't draw any conclusions.
>
> Your profiling info was with LWS=GWS. Please try this if you have the time:
>
> 1. Pull latest git
> 2. Run with KPC=0 (I expect it to pick 4096 or higher as best)
> 3. Do another profiling run with the best KPC
>
> The ALU figures (and speed) should go up a lot (I hope). If they are
> not, the profiling info should tell why.
>
> thanks,
> magnum
>


[ CONTENT OF TYPE text/html SKIPPED ]

[ CONTENT OF TYPE text/html SKIPPED ]

[ CONTENT OF TYPE text/html SKIPPED ]

# ProfilerVersion=2.4.1314
# Application=/home/claudio/bin/john/to_commit/run/john
# ApplicationArgs=--format=rar -t
# Device AMD Phenom(tm) II X6 1075T Processor PlatformVendor=Advanced Micro Devices, Inc.
# Device AMD Phenom(tm) II X6 1075T Processor PlatformName=AMD Accelerated Parallel Processing
# Device AMD Phenom(tm) II X6 1075T Processor PlatformVersion=OpenCL 1.1 AMD-APP (898.1)
# Device AMD Phenom(tm) II X6 1075T Processor CLDriverVersion=2.0
# Device AMD Phenom(tm) II X6 1075T Processor CLRuntimeVersion=OpenCL 1.1 AMD-APP (898.1)
# Device AMD Phenom(tm) II X6 1075T Processor NumberAppAddressBits=64
# Device Juniper PlatformVendor=Advanced Micro Devices, Inc.
# Device Juniper PlatformName=AMD Accelerated Parallel Processing
# Device Juniper PlatformVersion=OpenCL 1.1 AMD-APP (898.1)
# Device Juniper CLDriverVersion=CAL 1.4.1703
# Device Juniper CLRuntimeVersion=OpenCL 1.1 AMD-APP (898.1)
# Device Juniper NumberAppAddressBits=32
# OS=Ubuntu 11.10 \n \l
Method , ExecutionOrder , ThreadID , CallIndex , GlobalWorkSize , WorkGroupSize , Time , LocalMemSize , VGPRs , SGPRs , ScratchRegs , FCStacks , Wavefronts , ALUInsts , FetchInsts , WriteInsts , LDSFetchInsts , LDSWriteInsts , ALUBusy , ALUFetchRatio , ALUPacking , FetchSize , CacheHit , FetchUnitBusy , FetchUnitStalled , WriteUnitStalled , FastPath , CompletePath , PathUtilization , LDSBankConflict
SetCryptKeys__k1_Juniper1 ,     1 , 4177 , 44 , {   2560       1       1} , {  256     1     1} ,     10201.39244 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  59930330.40 ,   2585736.85 ,   4857573.10 ,         0.00 ,         0.00 ,        11.12 ,        23.18 ,        36.22 ,  78580552.81 ,         0.00 ,         1.89 ,         0.00 ,         1.51 , 143511374.62 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     2 , 4177 , 50 , {   2560       1       1} , {  256     1     1} ,     10133.35411 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  61541514.60 ,   2633251.77 ,   5024695.65 ,         0.00 ,         0.00 ,        11.38 ,        23.37 ,        36.26 ,  78697805.38 ,         0.00 ,         1.90 ,         0.01 ,         2.73 , 142107275.12 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     3 , 4177 , 56 , {   2560       1       1} , {  256     1     1} ,     10191.10222 ,           0 ,    46 , NA ,    18 ,     5 ,     46346.00 ,     53117.05 ,      2273.69 ,      4337.68 ,         0.00 ,         0.00 ,        11.37 ,        23.36 ,        36.27 ,  78709290.44 ,         0.00 ,         1.90 ,         0.01 ,         1.84 , 142775293.12 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     4 , 4177 , 62 , {   2560       1       1} , {  256     1     1} ,     10137.70055 ,           0 ,    46 , NA ,    18 ,     5 ,     34338.00 ,     71693.14 ,      3068.45 ,      5854.22 ,         0.00 ,         0.00 ,        11.42 ,        23.36 ,        36.27 ,  78706366.00 ,         0.01 ,         1.90 ,         0.00 ,         1.85 , 142506350.12 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     5 , 4177 , 68 , {   2560       1       1} , {  256     1     1} ,     10163.67856 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  63152698.80 ,   2680766.70 ,   5191818.20 ,         0.00 ,         0.00 ,        11.69 ,        23.56 ,        36.30 ,  78815057.94 ,         0.00 ,         1.91 ,         0.00 ,         2.08 , 141978299.88 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     6 , 4177 , 74 , {   2560       1       1} , {  256     1     1} ,     10180.62056 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  64763883.00 ,   2728281.62 ,   5358940.75 ,         0.00 ,         0.00 ,        11.97 ,        23.74 ,        36.34 ,  78932310.50 ,         0.00 ,         1.91 ,         0.00 ,         3.22 , 140864913.00 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     7 , 4177 , 80 , {   2560       1       1} , {  256     1     1} ,     10193.48256 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  64763883.00 ,   2728281.62 ,   5358940.75 ,         0.00 ,         0.00 ,        12.00 ,        23.74 ,        36.34 ,  78932310.50 ,         0.00 ,         1.92 ,         0.00 ,         2.54 , 140891224.38 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     8 , 4177 , 86 , {   2560       1       1} , {  256     1     1} ,     10228.44867 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  66375067.20 ,   2775796.55 ,   5526063.30 ,         0.00 ,         0.00 ,        12.23 ,        23.91 ,        36.38 ,  79049563.06 ,         0.00 ,         1.91 ,         0.00 ,         3.08 , 140955757.50 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,     9 , 4177 , 92 , {   2560       1       1} , {  256     1     1} ,     10188.36500 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 ,  66375067.20 ,   2775796.55 ,   5526063.30 ,         0.00 ,         0.00 ,        12.26 ,        23.91 ,        36.38 ,  79049563.06 ,         0.00 ,         1.92 ,         0.00 ,         2.65 , 140508586.38 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,    10 , 4177 , 98 , {   2560       1       1} , {  256     1     1} ,      8757.71067 ,           0 ,    46 , NA ,    18 ,     5 ,     34533.00 ,    116852.39 ,      4632.03 ,      9719.20 ,         0.00 ,         0.00 ,        21.70 ,        25.23 ,        37.15 , 114705031.75 ,         0.00 ,         3.44 ,         0.01 ,         0.37 , 277488663.00 ,         0.00 ,       100.00 ,         0.00
SetCryptKeys__k1_Juniper1 ,    11 , 4177 , 102 , {   2560       1       1} , {  256     1     1} ,      8749.67500 ,           0 ,    46 , NA ,    18 ,     5 ,        40.00 , 100879334.00 ,   3998088.00 ,   8389963.00 ,         0.00 ,         0.00 ,        21.68 ,        25.23 ,        37.15 , 114696490.00 ,         0.00 ,         3.44 ,         0.01 ,         0.37 , 277411409.88 ,         0.00 ,       100.00 ,         0.00


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ