Date: Mon, 19 Mar 2012 08:34:16 -0300 From: Claudio André <claudioandre.br@...il.com> To: john-dev@...ts.openwall.com Subject: Re: New patch for OpenCL SHA-512 Hi, what i did: - avoid register spillling (only fast memory: __private or __local) - unroll important loops - avoid branch when possible (if becomes ternary operator ?) On OpenCL the CPU usage patterns seems: 5 cores working and one coordinating. So i expect some loss compared to full 6 cores in OpenMP. I have a $ 110,00 dolars GPU , 400 MHz (versus 3 Ghz 6 core AMD CPU in openMP) , only 128 bits.Not that good. I do not have NVidia or better hardware. Anyway, i will: - check profile information too see what i miss - i know that passwords should be organized by its size (to avoid branch). In my tests i noticed it was not happening (two size and three size candidates put together). When a branch (if or for based on pass size) happens, some cores stop (serialization) and performance goes down. *John itself have to solve this*. - if there some expert in SHA, we might change the algorithm itself. Compared to CUDA and OpenSSL source code it seems Ok to me, now i don't have ideas to improve. - try to use some vectorized data. --- I have a real problem to solve, I'll try to do the best to solve my problem using the best performance. But , I'm afraid big gains have already happened. By the way, "why c/s virtual" improves a lot? What does it mean?  http://www.amazon.com/Sapphire-Radeon-DisplayPort-PCI-Express-100328L/dp/B004ZCHWBY/ref=dp_cp_ob_e_title_2 Em 18-03-2012 18:48, Solar Designer escreveu: > Hi Claudio, > > On Sun, Mar 18, 2012 at 08:11:54AM -0300, Claudio Andr? wrote: >> It uses john.conf for LWS e KPC (if available), fix the format and >> algorithm name, etc. >> It also uses fast memory to keep temporary buffers (improve performance). > Thank you! magnum is the one to merge this into the tree. > >> Numbers here: >> => CPU >> Benchmarking: crypt SHA-512 (rounds=5000) [OpenSSL 64/64]... DONE >> Raw: 440 c/s real, 440 c/s virtual >> >> => OMP >> Benchmarking: crypt SHA-512 (rounds=5000) [OpenSSL 64/64]... (6xOMP) DONE >> Raw: 2254 c/s real, 378 c/s virtual >> >> => OpenCL CPU >> Benchmarking: crypt SHA-512 (rounds=5000) [OpenCL]... DONE >> Raw: 1422 c/s real, 237 c/s virtual >> >> => OpenCL GPU >> Local work size (LWS) 64, Keys per crypt (KPC) 65536 >> Benchmarking: crypt SHA-512 (rounds=5000) [OpenCL]... DONE >> Raw: 1228 c/s real, 936228 c/s virtual > What CPU and GPU are you testing this on? > > How do these OpenCL speed numbers compare to those for Lukas' OpenCL > code? And to those for Lukas' CUDA code (if you use an Nvidia GPU)? > > The OpenCL GPU speed is still quite poor (although this works for a > development milestone) - do you have specific plans on improving it? > > Thanks again, > > Alexander Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.