john-users - Re: opencl sha1 jtr and others some experiments

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <AANLkTinDqVhCTfVC82ro1Z0XF0wJwzMU_XCQUm0Wi8V_@mail.gmail.com>
Date: Fri, 14 Jan 2011 14:14:07 +0200
From: Milen Rangelov <gat3way@...il.com>
To: john-users@...ts.openwall.com
Subject: Re: opencl sha1 jtr and others some experiments

On Fri, Jan 14, 2011 at 11:42 AM, Simon <simon@...quise.net> wrote:

> But even candidate password generation is a problem if you want to do it
> on CPU. I toyed for a while with an alternate "markov" mode, where you
> only generate the start of passwords on CPU and then you brute force
> their end on the GPU.
>
> This can be done, and that is allowed by the design of the multiple
> comparison functions.
>

In the code I'm working on, candidate generation is done partly on the CPU,
partly on the GPU for markov and bruteforce attacks. Three or four bytes of
the plaintext are generated on the GPU depending on a lookup table that is
precalculated on the host and transferred to the GPU only once at the
beginning. This works fast enough (I currently get about 980M/s on Radeon
HD6870 which is comparable with other crackers on that hardware), however
this also means that NDRange (and crack speed) depends on the charset size.
This is a bad design decision which requires lots of hacks, yet it works
good at the moment, it just requires writing separate kernels for smaller
charsets.

> But this wouldn't work well with JtR architecture. Or perhaps you could
> store the comparison results in a bitmap and only transfer this ?

That's a pity.

I use a bit complicated scheme for that (bitmap checks in the kernel, a
single int value which indicates that we have a successfully cracked hash
with this kernel invocation, an array of (found/not found) items - that
array index indicates at which offset of the output hashes buffer we do a
partial read (clEnqueueReadBuffer allows to read arbitrary ammount of memory
at arbitrary offset on the buffer).

It looks rather ugly but it's effective - in a case of single-hash cracking,
I only transfer a couple of kilobytes/sec on the PCIe bus. With multihash
attacks, transferred data increases with the hashlist, but with a proper
bitmap sizing you can easily support hundreds of thousands of them in the
hashlist and keep the PCIe transfers low at the same time.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.