Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 3 Apr 2012 18:05:19 +0800
From: myrice <>
Subject: Re: fast hashes on GPU

On Sat, Mar 31, 2012 at 3:08 PM, Solar Designer <> wrote:

> I just took a look.  You haven't yet implemented the keys_changed trick
> that I had proposed - you're sending the entire set of keys to GPU on
> every crypt_all() call, which you don't have to do.  Please implement
> this one trick and re-benchmark the thing _before_ you possibly proceed
> with the salts optimization (which is a lot more complicated).  We need
> to know which of the optimizations made what performance difference.
> Now, I have already implemented the keys_changed trick.  When no key is
changed, the keys will remain on GPU and will not invoke cudaMemcpy
function. Next step, I will implement salts optimizations(the lengthy one).
Here are benchmarks:(I will put on my github later)
---------Before keys_changed trick-----------------------
Benchmarking: Mac OS X 10.7+ salted SHA-512 CUDA [64/64]... DONE
Many salts: 1080K c/s real, 1086K c/s virtual
Only one salt: 1056K c/s real, 1059K c/s virtual
---------After keys_changed trick--------------------------
Benchmarking: Mac OS X 10.7+ salted SHA-512 CUDA [64/64]... DONE
Many salts: 1134K c/s real, 1134K c/s virtual
Only one salt: 1092K c/s real, 1092K c/s virtual

As I expected, this doesn't give a lot performance. Observations from cuda
profiler also provide that cudaMemcpy occupies a little time during crack.

P.S. I fix the bug you mentioned. And I added #pragma unroll 64 and
modified PLAINTEXT_LENGTH. However, on my G9600M GS card, this doesn't give
me a lot of performance. :(

Dongdong Li


Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ