Date: Sun, 25 Mar 2012 05:56:13 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: Hello, interested on slow hash and fast hash on GPU On Thu, Mar 22, 2012 at 10:13:03PM +0800, myrice wrote: > I have talked with Lukas and got very valuable information. Thanks him! > Since I am familiar with CUDA and hundreds lines of OpenCL, in this year of > GSoC, I would like to complete slow and fast hashes on page: > http://openwall.info/wiki/john/GPU. And from previous mailing list ( > http://www.openwall.com/lists/john-users/2011/03/19/3), I see there are > about 50 hashing functions to be implemented. The page only list 13. > What's the remain? Practically all hashes/ciphers supported in -jumbo and more (those that we might support later) are also candidates for GPU implementation. There are almost 100 of them currently in -jumbo (judging by the number of *_fmt*.c files in the magnum-jumbo git tree), or even almost 200 if we count the different dynamic sub-types separately (things like MD5-of-MD5, SHA-1-of-MD5, etc.), which we probably should (as it relates to producing efficient GPU code). For some of these, you may opt to implement the common crypto code on GPU and then wrap it into several JtR formats on CPU - e.g., you may have just one (or 2 or 3) DES implementations on GPU, but use it from 10 or so JtR formats. The resulting performance for "fast" hashes would be far from optimal, though - but maybe a few times better than it is with the current CPU code (also non-optimal for many of the formats in jumbo). To implement all of this stuff on GPU in a nearly optimal fashion, we're talking megabytes of new source code to be written. This brings it beyond scope of a GSoC student's summer project, so we have to focus on especially desirable targets first - like Lukas did last year. I and others have already mentioned some of the remaining desirable targets in recent postings to this list. An idea that was not considered before and may need more consideration by people already involved with our project (magnum and others): A curious approach could be to use some intermediate language, like the high-level assembler I mentioned a paper on recently, and to rewrite the crypto code for many/most/all formats in it. Then we could possibly have shared source code for multiple CPU and GPU architectures, yet achieve very good efficiency. In a sense, OpenCL is already an intermediate language like this, but being more high-level it might also be more limiting in what performance we may achieve. We could instead consider automatically translating from a high-level assembly language to OpenCL for architectures that we don't support more directly, and to the various architectures' native assembly languages for those that we do support in this way (these would be currently common CPUs and GPUs). > I also want to know how much of crypto knowledge I should know. I am now > learning online cryptography courses of Stanford. However, I think it is > too slow for the project. Any other learning materials? This is primarily about algorithms and code optimization, not crypto. > In addition, from Lukas and your GSoC page, I see optimization is needed. > Can you provide me an example for me to start with? All of the CUDA- and especially OpenCL-enabled JtR formats need optimization. Only phpass in CUDA gets somewhat close to optimal speed so far (similar to hashcat's speed); the rest are way behind the performance of competing tools (where applicable) or are otherwise believed to be far from optimal. (Even for phpass there must be room for improvement because certain specific micro-optimizations to MD5's basic functions currently don't show an overall speedup, indicating that there's some bottleneck that you could identify and try to remove.) Thanks, Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.