Date: Tue, 3 Apr 2012 18:05:19 +0800 From: myrice <qqlddg@...il.com> To: john-dev@...ts.openwall.com Subject: Re: fast hashes on GPU On Sat, Mar 31, 2012 at 3:08 PM, Solar Designer <solar@...nwall.com> wrote: > > I just took a look. You haven't yet implemented the keys_changed trick > that I had proposed - you're sending the entire set of keys to GPU on > every crypt_all() call, which you don't have to do. Please implement > this one trick and re-benchmark the thing _before_ you possibly proceed > with the salts optimization (which is a lot more complicated). We need > to know which of the optimizations made what performance difference. > > Now, I have already implemented the keys_changed trick. When no key is changed, the keys will remain on GPU and will not invoke cudaMemcpy function. Next step, I will implement salts optimizations(the lengthy one). Here are benchmarks:(I will put on my github later) ---------Before keys_changed trick----------------------- Benchmarking: Mac OS X 10.7+ salted SHA-512 CUDA [64/64]... DONE Many salts: 1080K c/s real, 1086K c/s virtual Only one salt: 1056K c/s real, 1059K c/s virtual ---------After keys_changed trick-------------------------- Benchmarking: Mac OS X 10.7+ salted SHA-512 CUDA [64/64]... DONE Many salts: 1134K c/s real, 1134K c/s virtual Only one salt: 1092K c/s real, 1092K c/s virtual As I expected, this doesn't give a lot performance. Observations from cuda profiler also provide that cudaMemcpy occupies a little time during crack. P.S. I fix the bug you mentioned. And I added #pragma unroll 64 and modified PLAINTEXT_LENGTH. However, on my G9600M GS card, this doesn't give me a lot of performance. :( Thanks! Dongdong Li Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.