john-dev - Re: Idea to increase plaintext length for GPU based hashes

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <16b560f57309ecad2c69d58a6f7b9ee3@smtp.hushmail.com>
Date: Tue, 19 Mar 2013 21:27:24 +0100
From: magnum <john.magnum@...hmail.com>
To: john-dev@...ts.openwall.com
Subject: Re: Idea to increase plaintext length for GPU based hashes

On 19 Mar, 2013, at 16:16 , Claudio André <claudioandre.br@...il.com> wrote:
> Em 18-03-2013 22:21, magnum escreveu:
>> Another approach (not necessarily mutex to yours) would be to split the transfer. Let's say we have a work size of 1M. At, say, the 256K'th call to set_key(), it could initiate a transfer of this first fourth of keys to GPU. This transfer will not stall the host side, it will take place while we continue with the next 256K keys. And so on. If we can balance this properly we should get rid of much of the transfer delay. Maybe we should split it in 8 or 16, maybe less.
> 
> Well, it is easy to (somehow) implement your idea. Good gain.

> Raw:    13653K c/s real, 47953K c/s virtual

> Raw:    17096K c/s real, 36408K c/s virtual

I hoped for a lot more :-(

> But, this not enought to do the trick. Below, I measure only the GPU part in auto-tune. I expect to be at 113M not 17M.
> Is it only set_key() to blame? Btw: auto-tune on unstable goes at 19M.

I tried the same on ntlmv2-opencl with various split sizes but there was no gain at all.

Regarding set_key() bottleneck:

magnum@...l:src [bleeding-jumbo]$ ../run/john -t -fo:lm 
Benchmarking: LM DES [128/128 BS XOP-16]... DONE
Raw:    60458K c/s real, 60458K c/s virtual

...if we can get 60M c/s using one CPU core, we should beat that figure for any raw GPU format if we can hide transfer latency. We need some real profiling.

magnum

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.