Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 25 Mar 2012 05:56:13 +0400
From: Solar Designer <>
Subject: Re: Hello, interested on slow hash and fast hash on GPU

On Thu, Mar 22, 2012 at 10:13:03PM +0800, myrice wrote:
> I have talked with Lukas and got very valuable information. Thanks him!
> Since I am familiar with CUDA and hundreds lines of OpenCL, in this year of
> GSoC, I would like to complete slow and fast hashes on page[1]:
> And from previous mailing list (
>, I see there are
> about 50 hashing functions to be implemented. The page[1] only list 13.
> What's the remain?

Practically all hashes/ciphers supported in -jumbo and more (those that
we might support later) are also candidates for GPU implementation.
There are almost 100 of them currently in -jumbo (judging by the number
of *_fmt*.c files in the magnum-jumbo git tree), or even almost 200 if
we count the different dynamic sub-types separately (things like
MD5-of-MD5, SHA-1-of-MD5, etc.), which we probably should (as it relates
to producing efficient GPU code).

For some of these, you may opt to implement the common crypto code on
GPU and then wrap it into several JtR formats on CPU - e.g., you may
have just one (or 2 or 3) DES implementations on GPU, but use it from 10
or so JtR formats.  The resulting performance for "fast" hashes would be
far from optimal, though - but maybe a few times better than it is with
the current CPU code (also non-optimal for many of the formats in jumbo).

To implement all of this stuff on GPU in a nearly optimal fashion, we're
talking megabytes of new source code to be written.  This brings it
beyond scope of a GSoC student's summer project, so we have to focus on
especially desirable targets first - like Lukas did last year.  I and
others have already mentioned some of the remaining desirable targets in
recent postings to this list.

An idea that was not considered before and may need more consideration
by people already involved with our project (magnum and others):
A curious approach could be to use some intermediate language, like the
high-level assembler I mentioned a paper on recently, and to rewrite the
crypto code for many/most/all formats in it.  Then we could possibly
have shared source code for multiple CPU and GPU architectures, yet
achieve very good efficiency.  In a sense, OpenCL is already an
intermediate language like this, but being more high-level it might also
be more limiting in what performance we may achieve.  We could instead
consider automatically translating from a high-level assembly language
to OpenCL for architectures that we don't support more directly, and to
the various architectures' native assembly languages for those that we
do support in this way (these would be currently common CPUs and GPUs).

> I also want to know how much of crypto knowledge I should know. I am now
> learning online cryptography courses of Stanford. However, I think it is
> too slow for the project. Any other learning materials?

This is primarily about algorithms and code optimization, not crypto.

> In addition, from Lukas and your GSoC page, I see optimization is needed.
> Can you provide me an example for me to start with?

All of the CUDA- and especially OpenCL-enabled JtR formats need
optimization.  Only phpass in CUDA gets somewhat close to optimal speed
so far (similar to hashcat's speed); the rest are way behind the
performance of competing tools (where applicable) or are otherwise
believed to be far from optimal.  (Even for phpass there must be room
for improvement because certain specific micro-optimizations to MD5's
basic functions currently don't show an overall speedup, indicating that
there's some bottleneck that you could identify and try to remove.)



Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.