john-dev - fast hashes on GPU (was: Working on DES format on CUDA)

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20120327233707.GA19375@openwall.com>
Date: Wed, 28 Mar 2012 03:37:07 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: fast hashes on GPU (was: Working on DES format on CUDA)

On Wed, Mar 28, 2012 at 05:21:35AM +0800, myrice wrote:
> For fast hash, I see there is no md5 implement on cuda. Do you think it is
> good for me to start with?

You mean raw MD5, the "fast" hash?  Yes, I think we have it in OpenCL
only.  However, you won't be able to demonstrate whether your CUDA
implementation is efficient or not - it'd just bump into the current
bottleneck for fast hashes.  For example, on CPUs we're currently
getting speeds of around 30M c/s per core, and the OpenCL implementation
on GPU is doing around 50M c/s total, which is less than cumulative
speed of multiple CPU cores.

You may have slightly better ability to demonstrate the quality of your
implementation if you pick a slightly slower "fast" hash - such as raw
SHA-512.  I think we currently only have raw SHA-256 in CUDA, but not
SHA-512.  In OpenCL, we don't have either.  However, we have both kinds
of SHA-crypt in CUDA and crypt-SHA-512 in OpenCL, so I presume that's
where you'd rip code from. ;-)  (And for raw MD5 in CUDA, you'd rip from
phpass.)  That's perfectly fine, with the drawback being that we'd have
limited data to evaluate your skills.

Other options include the Mac OS X hashes known as XSHA512 and XSHA in
JtR.  These are fast, but salted - so the candidate passwords only
need to be transferred to the GPU once per all salts (then John just
switches salts until all are tested, and then it moves to the next bunch
of candidate passwords, which need to be transferred).  So you'd be able
to show better efficiency for the "many salts" benchmarks for these.

We also have SSHA (NSLDAP) in OpenCL only; you could add CUDA.  It is
similar to XSHA in its properties (fast, but salted).  XSHA512 is a lot
better in terms of efficiency that may be achieved, though.

Finally, you may actually try to remove the bottleneck for fast
hashes - have JtR generate some candidate passwords on GPU e.g. in some
new cracking mode and/or have it compare computed hashes on GPU as well.
This is not easy to do well and we will be unlikely to directly use the
results of your work on this as I have my own thoughts on how to
approach the task during the summer, but you might demonstrate that
you're well-suited for the job and your approach to the task might
influence our actual implementation.

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.