john-dev - Re: sha256 format patches

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20110411231656.GA12283@openwall.com>
Date: Tue, 12 Apr 2011 03:16:56 +0400
From: Solar Designer <solar@...nwall.com>
To: john-dev@...ts.openwall.com
Subject: Re: sha256 format patches

On Tue, Apr 12, 2011 at 12:11:14AM +0200, ?ukasz Odzioba wrote:
> On my gpu it is in fact 5% but it is easy to buy 10x faster GPU so i
> am aware that pci-e transfer matters.

Right, but your current code will likely bump into another bottleneck
first.  I think it already does on my system.

> >Now how about implementing SHA-crypt?  You'll also need to implement SHA-512 for that, which is trickier (64-bit integers).

> I'll try and see what I can do. Cuda offers 32 and 24bits integers.
> The trick is that 24bis operations are almost 8times faster (but
> nvidia claims it might change in the future)

I think that you're confused.  I did a web search on this, and only
found it mentioned in integer multiplication context, which we don't use
in these hashes.

> so meaby it is worth to
> implement 64bit operations on both types and compare efficiency.

I guess there are two ways to implement SHA-512 on GPUs that only
support up to 32-bit integers:

1. Use pairs of 32-bit integers and handle carry on addition manually
(or are there special instructions for that?)

2. Use a bitslice implementation.

But I haven't really looked into this.  I am leaving it for you.

> I do not understand what partial hash is and how it affect on speed.
> Could you please tell me more details how to do it?

I've attached a patch (hack) demonstrating this.

> Thanks for benchmark.  There is still what to do in optimization this
> code. I think that results can be improved by finding optimal
> threads/blocks/registes settings. As I mentioned it is important to
> develop some self-configuration script to get maximum occupancy on
> every card what I will do soon.

I think there must be many other changes to make.  I've just experimented
with hashcat tools on the same machine, and here are some numbers
(checks per second with one hash loaded for cracking):

oclHashcat-lite-0.02:
SHA-256: 52M
MD5: 556M
oclHashcat-0.25:
MD5: 405M

As you can see, much higher speeds are possible, although
oclHashcat-lite is limited to cracking just one hash at a time, which
enables it to partially reverse the hash loaded for cracking (not an
optimization we should consider now).

Anyway, I'd like to see your patch for a slow hash, and benchmark that.

Thanks,

Alexander

View attachment "john-1.7.6-sha256cuda-1mod.diff" of type "text/plain" (2336 bytes)

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.