john-dev - Re: xsha512-cuda & xsha512-opencl testing

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120716093510.GA21271@openwall.com>
Date: Mon, 16 Jul 2012 13:35:10 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: xsha512-cuda & xsha512-opencl testing

myrice -

On Mon, Jul 16, 2012 at 04:10:01PM +0800, myrice wrote:
> Unfortunately, after lukas's work on bull, I cannot run my cuda format on it...

It's weird, mscash2-cuda worked, but xsha512-cuda did not.  I've just
rebooted bull, and xsha512-cuda works now.

BTW, xsha512-cuda produces nasty sound at maybe 5 KHz or so - is this
the frequency of PCIe transfers or global memory accesses or something
like that?

> This is result under xsha512-opencl with incremental mode.

Which incremental mode, exactly?  This matters.  If the incremental mode
is not locked to a specific password length (e.g., just length 8), then
there's some overhead early on to switch between lengths.  For quick
runs (like a few minutes), this overhead is significant.  So you should
be using -i=all8 (locked to length 8 only).  Is this what you used?

> Incremental mode on xsha512-opencl with 7970:
> HashNum_SaltNum
> 1_1
> guesses: 1  time: 0:00:00:06 DONE (Mon Jul 16 10:54:39 2012)  c/s: 12838K

6 seconds is too little, but otherwise this is reasonable.

> 100_100
> guesses: 6  time: 0:00:06:45 0.00%  c/s: 48827K
> 
> 10K_10K
> guesses: 89  time: 0:00:03:00 0.00%  c/s: 49944K

OK.

> 10K_100
> guesses: 279  time: 0:00:05:40 0.00%  c/s: 2871M

About 29M c/s raw hashing speed.  The slowdown from 50M to 29M with 100
hashes/salt is not too bad.  I was afraid it'd be worse.  Yet there
should be lots of room for improvement here.

> 10K_1
> guesses: 5351  time: 0:00:03:43 0.00%  c/s: 72953M

Too many got cracked (over 50%).

> 1M_1
> guesses: 47731  time: 0:00:03:56 0.00%  c/s: 2707G

I guess we'd achieve a similar speed on CPU (2.7M passwords/second).

> 1M_1K
> guesses: 4196  time: 0:00:04:41 0.00%  c/s: 21220M

21M, quite reasonable and better than current CPU code.

I guess this is the bottleneck of transfers of hashes from GPU to CPU,
for get_hash*()?

> 1M_1M
> guesses: 50  time: 0:00:04:02 0.00%  c/s: 51453K

OK.

Overall, the scaling with many hashes per salt is better than what I had
expected for your code (since it was not subjected to such
testing/tuning before), but it's not perfect.

Thanks,

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.