john-dev - Re: GSoC Project proposal:JtR GPU for Slow hashes

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20120403121610.GE15294@openwall.com>
Date: Tue, 3 Apr 2012 16:16:10 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: GSoC Project proposal:JtR GPU for Slow hashes

Hi Sayantan,

On Tue, Apr 03, 2012 at 01:15:21PM +0530, SAYANTAN DATTA wrote:
> 1.None of the openCL kernel is compatible with radeon 4000 series or lower
> gpus.All of them have to be modified as the 4000 series radeon don't
> support cl_addressable_byte_storage.

Fixing this is something you could work on, yes.  If there's any
performance impact from such fixes (for newer GPUs that don't need the
fixes), then both versions of the code should be present (but without
duplication of source code shared between the versions).

> 2.OSTIMER problem ,which I have alrady discussed with magnum.

We'll deal with this one shortly.

> 3.Memory allocation for storing the source in kernel_read() function needs
> to be dynamic instead of static.

Yes, or we should move to pre-compiling OpenCL code (like we do for
CUDA), which may be preferable anyway.  I am not familiar with this,
though.  I'd like to have this discussed on its own thread (with
magnum, Milen, and likely others).

In general, while I posted some quick comments above, these topics
deserve separate threads with proper Subjects (one thread per topic).

> Also I would like to know what are the other algorithms that needs to be
> implemented.As the GSoC ideas page gives very little idea of what needs to
> be done ,it would be very helpful if you could elaborate a little more.

Pretty much all that have been mentioned in here since the beginning of
March are within consideration.  Several of them build upon PBKDF2 with
SHA-1 (and for Mac OS X 10.8 apparently we'll need PBKDF2 with SHA-512,
but I haven't looked into that yet), so that's one thing to have and to
optimize really well.  Then we need the JtR formats built upon PBKDF2
with SHA-1 on GPU, adding the necessary wrapper code on CPU for specific
uses.  Then there are many non-hashes.  Some of these overlap with what
I've just mentioned (PBKDF2 stuff), some are different (Office 2007),
thus needing their own on-GPU code.  Then there are DES-based hashes,
some of which are slow or semi-slow (and we'll want to reuse the DES
implementation for the corresponding fast hashes as well).  Finally,
there's bcrypt (Blowfish-based), which is difficult (likely even
unrealistic) to achieve reasonable performance for (but we don't know
reliably until we've tried hard).  I'm sure I've missed some here;
please re-read archived postings since March 1st and make suggestions:

http://www.openwall.com/lists/john-dev/

Besides stuff to implement from scratch, we'll need OpenCL code for
stuff that we currently have as CUDA-only, and vice versa.  You've
already started with this with your work on MSCash2 (thank you!)

Similarly, the existing OpenCL and/or CUDA code needs to be optimized
further.  Lukas said that this is what he intends to apply for under
GSoC, but that does not prevent you from offering to work on it as well.

In general, even if another person already works on something, you may
offer to work on it too.

> I am
> also considering to implement the above bug fixes as part of my project if
> you think it is necessary.

Sounds good.  OS_TIMER is trivial and will be dealt with now, but the
rest might be yours.

> Also I'm considering to get the latest generation
> Radeon GPUs so that I can code more efficiently for the newer GPUs.This
> would also enable me to implement multi GPU cracking in JtR.In the first
> phase I will code for the radeon 4000 series and then modify it for the
> newer GPUs.The reverse could also be done but I'll follow your suggestions
> on this.

Thank you for suggesting this.  I think it'd be best if you test your
code on multiple GPUs as you develop - not postpone porting to another
GPU family as a separate task.  It'd be nice if you could get more than
two GPUs - not just 4000 series vs. newer, but also Nvidia.  I realize
that buying lots of GPUs is costly, though.

Multi-GPU support within one instance of JtR would be very nice to have.
There are two major approaches to this, I think: high-level (similar to
what we have with MPI now) vs. per-format (similar to our current OpenMP
stuff).  It might be best to support both (they have their pros and cons).
This deserves its own discussion thread, though.

Alexander

P.S. Somehow you tend to miss spaces between sentences (e.g., in "...
modify it for the newer GPUs.The reverse ..."), which makes your
messages a bit harder to read.  Something similar is even seen in your
source code (e.g., "#include<stdlib.h>" and similar in opencl_MSCASH2_fmt.c).
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.