Date: Tue, 21 May 2013 22:51:14 +0400 From: Solar Designer <solar@...nwall.com> To: john-dev@...ts.openwall.com Subject: Re: cmp_all on gpu for descrypt-opencl On Tue, May 21, 2013 at 10:55:04PM +0530, Sayantan Datta wrote: > I have been able to run cmp_all() on GPU but the results aren't that good. > In fact we have little performance penalty with cmp_all() on gpu. I guess you mean "a little", not "little". > I think the overhead involved is too much and the kernel has lot of > branching(that depends on user data). Yes, in cmp_all() the branching in fact depends on user data, unlike in the current crypt_all(). To reduce the impact of this, the up to 64 iterations loop may be split in two, as I mentioned in a comment here: http://www.reddit.com/r/crypto/comments/162ufx/research_project_opencl_bitslice_des_bruteforce/ In my testing, the optimal numbers were 20 iterations in the first loop and 44 in the second. This was not in our descrypt-opencl code, but it was in another bitslice DES implementation on GPU, which is similar in this respect. I think the same or similar numbers will be optimal for us as well. From the reddit comment: "[...] split this loop in two, only proceeding with the second half if the first indicates there still might be a match. These specific speed numbers are for 20 iterations in the first loop (and 44 in the second). The first loop's iteration count needs to be high enough that a branch will actually be made most of the time, letting all SIMDs in a CU fully skip the second loop." Also relevant is the speed number obtained at the time - ~6840 Mkeys/s when computing all 16 rounds of DES and checking the result as above. This gives us a speed of at least 6840 / 25 = 273M c/s at descrypt, and more than that if we consider that we'd only do the comparisons once per 25 DES encryptions rather than after each DES encryption. This is assuming that the kernel is specialized for a given salt value, but that's what you're already doing. Thus, there's lots of room for optimization in your current descrypt-opencl, but of course these speeds will require generating candidate passwords on GPU and comparing the hashes on GPU. > Also the amount of work is very less > compared to overhead. One possible way to improve performance might be to > load all hashes instead of just one per cmp_all() call. But that doesn't > seems possible with current format. It is possible with the current (new) format: you implement the comparisons in crypt_all() instead of in cmp_all(). This is why I added the "struct db_salt *salt" argument to crypt_all(). That way, you also avoid a second kernel invocation. I posted an example of this, for LM hashes on CPU, to john-dev last summer: http://www.openwall.com/lists/john-dev/2012/07/21/1 You need to look at the changes for DES_bs_crypt_LM() - you may search that posting for "This integer division may be slow - need to revise" to find the right place. (BTW, this comment gives the reason why I haven't merged this portion of the changes into the core tree... but I think magnum did merge it into bleeding anyway.) Alexander
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.