Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 19 Mar 2012 08:34:16 -0300
From: Claudio André <claudioandre.br@...il.com>
To: john-dev@...ts.openwall.com
Subject: Re: New patch for OpenCL SHA-512

Hi, what i did:
- avoid register spillling (only fast memory: __private or __local)
- unroll important loops
- avoid branch when possible (if becomes ternary operator ?)

On OpenCL the CPU usage patterns seems: 5 cores working and one
coordinating. So i expect some loss compared to full 6 cores in OpenMP.

I have a $ 110,00 dolars GPU [1],  400 MHz (versus 3 Ghz 6 core AMD CPU
in openMP) , only 128 bits.Not that good. I do not have NVidia or better
hardware.

Anyway, i will:
- check profile information too see what i miss
- i know that passwords should be organized by its size (to avoid
branch). In my tests i noticed it was not happening (two size and three
size candidates put together). When a branch (if or for based on pass
size) happens, some cores stop (serialization) and performance goes
down. *John itself have to solve this*.
- if there some expert in SHA, we might change the algorithm itself.
Compared to CUDA and OpenSSL source code it seems Ok to me, now i don't
have ideas to improve.
- try to use some vectorized data.
---
  
I have a real problem to solve, I'll try to do the best to solve my
problem using the best performance. But , I'm afraid big gains have
already happened.

By the way, "why c/s virtual" improves a lot? What does it mean?

[1]
http://www.amazon.com/Sapphire-Radeon-DisplayPort-PCI-Express-100328L/dp/B004ZCHWBY/ref=dp_cp_ob_e_title_2


Em 18-03-2012 18:48, Solar Designer escreveu:
> Hi Claudio,
>
> On Sun, Mar 18, 2012 at 08:11:54AM -0300, Claudio Andr? wrote:
>> It uses john.conf for LWS e KPC (if available), fix the format and 
>> algorithm name, etc.
>> It also uses fast memory to keep temporary buffers (improve performance).
> Thank you!  magnum is the one to merge this into the tree.
>
>> Numbers here:
>> => CPU
>> Benchmarking: crypt SHA-512 (rounds=5000) [OpenSSL 64/64]... DONE
>> Raw:    440 c/s real, 440 c/s virtual
>>
>> => OMP
>> Benchmarking: crypt SHA-512 (rounds=5000) [OpenSSL 64/64]... (6xOMP) DONE
>> Raw:    2254 c/s real, 378 c/s virtual
>>
>> => OpenCL CPU
>> Benchmarking: crypt SHA-512 (rounds=5000) [OpenCL]... DONE
>> Raw:    1422 c/s real, 237 c/s virtual
>>
>> => OpenCL GPU
>> Local work size (LWS) 64, Keys per crypt (KPC) 65536
>> Benchmarking: crypt SHA-512 (rounds=5000) [OpenCL]... DONE
>> Raw:    1228 c/s real, 936228 c/s virtual
> What CPU and GPU are you testing this on?
>
> How do these OpenCL speed numbers compare to those for Lukas' OpenCL
> code?  And to those for Lukas' CUDA code (if you use an Nvidia GPU)?
>
> The OpenCL GPU speed is still quite poor (although this works for a
> development milestone) - do you have specific plans on improving it?
>
> Thanks again,
>
> Alexander

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.