Date: Sat, 31 Mar 2012 10:46:56 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: fast hashes on GPU On Sat, Mar 31, 2012 at 01:23:24PM +0800, myrice wrote: > On Mar 31, 2012 5:18 AM, "magnum" <john.magnum@...hmail.com> wrote: > > > Are you saying you want to change the salt interface so you can loop > > over all loaded salts without leaving GPU? > > Yes, but I am not sure how much performance it will give. You can do it without changing the formats interface and benchmark code. Here's how: The first time set_salt() and crypt_all() calls are made for a given salt, you process that salt separately (do the actual hashing), but you also record the salt value. For your purpose, you'd want to record it from your GPU code, so that it can access the salt directly next time. You also record a copy of it from your CPU code. The second time the set_salt() and crypt_all() calls are made (for previously seen salts since no new hashes and thus no new salts may be loaded during cracking), you do the bulk hashing (for all previously seen salts) on the very first call to crypt_all(). Yes, you must do it as soon as you see the same salt value again - including for salts that have not yet been seen for a second time. You can't delay processing of the current salt because cmp_*() and get_hash*() must return correct values right after the crypt_all() call, and you don't want to delay processing of further salts because you want the potential speedup from bulk hashing. Further calls to set_salt() and crypt_all() until a salt is seen for a third time do not need to invoke any GPU code. Instead, set_salt() should set a global variable (on CPU) that will tell further cmp_*() and get_hash*() calls to use the right set of result buffers, and it may mark salts (also on CPU) as actually seen again. Some salts might be no longer seen if all of their corresponding hashes have been cracked. When a salt is finally seen for a third time and thus it's time for you to invoke the GPU code again, you may also use this opportunity to notify your GPU code of the now-missing salts (so it will learn of that with a delay of one full set of candidate passwords, but that's OK). Then you process the third full set of candidate passwords and on in a similar fashion. A similar trick may work for recording of hashes to compare against in cmp_all() and then moving the comparisons onto the GPU starting with the second set of candidates. Then you only need to transfer one bit (literally) per GPU code invocation back to the CPU most of the time (that bit will mean "no matches" or "there might be matches, please request detail"). Once this is implemented, we might even want to adjust the thresholds for use of hash tables such that those are not used (as that would result in get_hash*() calls being made instead of cmp_all(), thereby not letting you offload the comparisons onto the GPU). That's a separate task, though. As you can see, with some tricks a lot can be done for speeding up fast yet salted hashes even within the current formats interface. To my knowledge, these have not been attempted yet. I did not suggest them until now. (For fast and saltless hashes - or with very few salts - set_key() will be a bottleneck, and this is not addressed with these tricks. A formats interface change will actually be needed to deal with that.) For benchmarking, we might have to increase the benchmark duration (e.g., use --test=10 or more) in order to have many sets of candidate passwords processed (not just the first set, which has to be treated specially and is presumably slower to process). With very large numbers of loaded hashes (such as millions), you may have to use a different approach or maybe lower max_keys_per_crypt. Otherwise the programs may become too non-interactive (and you might even be hitting timeouts imposed by drivers or by CUDA or OpenCL libraries), or the salts or hashes (if you do the cmp_all() recording trick as well) might not even fit in GPU memory. You don't need to bother about that for your initial implementation, though. It'd be very nice if one or more of the GSoC candidates attempt this and demonstrate some results now. Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.