Date: Tue, 3 Apr 2012 16:16:10 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: GSoC Project proposal:JtR GPU for Slow hashes Hi Sayantan, On Tue, Apr 03, 2012 at 01:15:21PM +0530, SAYANTAN DATTA wrote: > 1.None of the openCL kernel is compatible with radeon 4000 series or lower > gpus.All of them have to be modified as the 4000 series radeon don't > support cl_addressable_byte_storage. Fixing this is something you could work on, yes. If there's any performance impact from such fixes (for newer GPUs that don't need the fixes), then both versions of the code should be present (but without duplication of source code shared between the versions). > 2.OSTIMER problem ,which I have alrady discussed with magnum. We'll deal with this one shortly. > 3.Memory allocation for storing the source in kernel_read() function needs > to be dynamic instead of static. Yes, or we should move to pre-compiling OpenCL code (like we do for CUDA), which may be preferable anyway. I am not familiar with this, though. I'd like to have this discussed on its own thread (with magnum, Milen, and likely others). In general, while I posted some quick comments above, these topics deserve separate threads with proper Subjects (one thread per topic). > Also I would like to know what are the other algorithms that needs to be > implemented.As the GSoC ideas page gives very little idea of what needs to > be done ,it would be very helpful if you could elaborate a little more. Pretty much all that have been mentioned in here since the beginning of March are within consideration. Several of them build upon PBKDF2 with SHA-1 (and for Mac OS X 10.8 apparently we'll need PBKDF2 with SHA-512, but I haven't looked into that yet), so that's one thing to have and to optimize really well. Then we need the JtR formats built upon PBKDF2 with SHA-1 on GPU, adding the necessary wrapper code on CPU for specific uses. Then there are many non-hashes. Some of these overlap with what I've just mentioned (PBKDF2 stuff), some are different (Office 2007), thus needing their own on-GPU code. Then there are DES-based hashes, some of which are slow or semi-slow (and we'll want to reuse the DES implementation for the corresponding fast hashes as well). Finally, there's bcrypt (Blowfish-based), which is difficult (likely even unrealistic) to achieve reasonable performance for (but we don't know reliably until we've tried hard). I'm sure I've missed some here; please re-read archived postings since March 1st and make suggestions: http://www.openwall.com/lists/john-dev/ Besides stuff to implement from scratch, we'll need OpenCL code for stuff that we currently have as CUDA-only, and vice versa. You've already started with this with your work on MSCash2 (thank you!) Similarly, the existing OpenCL and/or CUDA code needs to be optimized further. Lukas said that this is what he intends to apply for under GSoC, but that does not prevent you from offering to work on it as well. In general, even if another person already works on something, you may offer to work on it too. > I am > also considering to implement the above bug fixes as part of my project if > you think it is necessary. Sounds good. OS_TIMER is trivial and will be dealt with now, but the rest might be yours. > Also I'm considering to get the latest generation > Radeon GPUs so that I can code more efficiently for the newer GPUs.This > would also enable me to implement multi GPU cracking in JtR.In the first > phase I will code for the radeon 4000 series and then modify it for the > newer GPUs.The reverse could also be done but I'll follow your suggestions > on this. Thank you for suggesting this. I think it'd be best if you test your code on multiple GPUs as you develop - not postpone porting to another GPU family as a separate task. It'd be nice if you could get more than two GPUs - not just 4000 series vs. newer, but also Nvidia. I realize that buying lots of GPUs is costly, though. Multi-GPU support within one instance of JtR would be very nice to have. There are two major approaches to this, I think: high-level (similar to what we have with MPI now) vs. per-format (similar to our current OpenMP stuff). It might be best to support both (they have their pros and cons). This deserves its own discussion thread, though. Alexander P.S. Somehow you tend to miss spaces between sentences (e.g., in "... modify it for the newer GPUs.The reverse ..."), which makes your messages a bit harder to read. Something similar is even seen in your source code (e.g., "#include<stdlib.h>" and similar in opencl_MSCASH2_fmt.c).
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.