Date: Mon, 25 Jun 2012 03:49:48 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: get_hash*() in GPU formats (was: Jumbo candidate vs Test Suite) Lukas, myrice - On Mon, Jun 25, 2012 at 01:00:50AM +0200, Lukas Odzioba wrote: > 2012/6/25 Solar Designer <solar@...nwall.com>: > > Why are you dropping the get_hash*() functions? This would be a > > performance hit when there are more than a few hashes per salt. > > > > Normally, this shouldn't be the case for md5crypt and phpass hashes, but > > things may be weird in the real world - and even more so in contests as > > we have recently seen. ;-) > > 1) "crack checking" was moved to gpu code so now we copy back just 1 > byte per hash not BINARY_SIZE bytes. Why not make it 1 byte per crypt_all() in the typical case (when nothing got cracked with that one call)? > 2) code looks cleaner Yes. > 3) those are slow formats and I was hoping that it won't be a problem. Understood, but by that logic your #1 reason doesn't matter. ;-) Anyway, what I think you could do is have partial hashes sufficient for get_hash*() to work transferred from GPU the first time a get_hash*() function is called (if one is called). That is, have a global variable ("static" inside the format file) that you'd reset on crypt_all() to indicate that you do not have the hashes on CPU side yet. Have a function that transfers the partial hashes from GPU and sets the variable. Call this function from all get_hash*() functions when the variable is zero. There's already similar code in cuda_xsha512_fmt.c and cuda/xsha512.cu, except that the variable is only checked inside cuda_xsha512_cpy_hash(). I think those checks should be moved right into get_hash*() to avoid the function call in the typical case. And the variable itself should be moved from the .cu to the .c file. BTW, you won't notice this in --test benchmarks. You need to actually simulate different hashes per salt ratios in sample password hash files to see the effect of changes in this area. A next step could be to have this data transfer from GPU overlap with computation on GPU. You could achieve this e.g. by predicting that hashes will be requested (due to past requests for this same salt) and starting to transfer the first half of hashes while the second half is being computed, then start the second transfer right before leaving crypt_all(). get_hash*() calls are made in order of increasing index, so this may help. This is probably overkill for slow and salted hashes, but e.g. for raw SHA-512 it may be done. An alternative to this is to completely disable the use of CPU side bitmaps and hash tables for formats that support offload of hash comparisons onto GPU - and to implement similar bitmaps and hash tables on GPU side (along with caching of loaded hashes). A drawback is that we might need to have fallback code for the case of loaded hashes not fitting in GPU memory, though. I think a mix of both approaches may work best: have a higher threshold for on CPU bitmaps and hash tables, but do support them - with the optimizations I described above. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.