Date: Tue, 12 Apr 2011 03:16:56 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: sha256 format patches On Tue, Apr 12, 2011 at 12:11:14AM +0200, ?ukasz Odzioba wrote: > On my gpu it is in fact 5% but it is easy to buy 10x faster GPU so i > am aware that pci-e transfer matters. Right, but your current code will likely bump into another bottleneck first. I think it already does on my system. > >Now how about implementing SHA-crypt? You'll also need to implement SHA-512 for that, which is trickier (64-bit integers). > I'll try and see what I can do. Cuda offers 32 and 24bits integers. > The trick is that 24bis operations are almost 8times faster (but > nvidia claims it might change in the future) I think that you're confused. I did a web search on this, and only found it mentioned in integer multiplication context, which we don't use in these hashes. > so meaby it is worth to > implement 64bit operations on both types and compare efficiency. I guess there are two ways to implement SHA-512 on GPUs that only support up to 32-bit integers: 1. Use pairs of 32-bit integers and handle carry on addition manually (or are there special instructions for that?) 2. Use a bitslice implementation. But I haven't really looked into this. I am leaving it for you. > I do not understand what partial hash is and how it affect on speed. > Could you please tell me more details how to do it? I've attached a patch (hack) demonstrating this. > Thanks for benchmark. There is still what to do in optimization this > code. I think that results can be improved by finding optimal > threads/blocks/registes settings. As I mentioned it is important to > develop some self-configuration script to get maximum occupancy on > every card what I will do soon. I think there must be many other changes to make. I've just experimented with hashcat tools on the same machine, and here are some numbers (checks per second with one hash loaded for cracking): oclHashcat-lite-0.02: SHA-256: 52M MD5: 556M oclHashcat-0.25: MD5: 405M As you can see, much higher speeds are possible, although oclHashcat-lite is limited to cracking just one hash at a time, which enables it to partially reverse the hash loaded for cracking (not an optimization we should consider now). Anyway, I'd like to see your patch for a slow hash, and benchmark that. Thanks, Alexander View attachment "john-1.7.6-sha256cuda-1mod.diff" of type "text/plain" (2336 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.