Date: Wed, 2 May 2012 20:34:43 -0500 From: "jfoug" <jfoug@....net> To: <john-dev@...ts.openwall.com> Subject: RE: New JtR functionality, re-build lost salts From: magnum [mailto:john.magnum@...hmail.com] >> This modification to JtR will allow these (missing salts) to be >>found >> (albeit, pretty slowly). > >This is a curious patch, I haven't had time to try it out but I will >later. > >A thing that hits me is this is a task a fast GPU format like raw-md5 >could do very well, without the bandwidth problems it has otherwise... >we just supply a fairly small buffer of words (perhaps just one word) >and the GPU code generates all salts itself. But I guess it would need >some support from the format interface. > >I suppose this patch as-is could be used with a slightly modified GPU >format with less work, but then we'd have to transfer salts from CPU >side. That is much lighter than transfering millions of keys though. The way things work, is for each key, you run 'almost' the same crypt code, X times, where X is the universe of salt's. So for OSC, there are 95**2 runs for every candidate PW that many times. The 'generation' of the salts is highly trivial. For some formats (md5-6 and md5-9), the candidate needs to have MD5 code run 1 time, then ALL salts use the results of that. I am not sure how easily that would 'scale' to GPU code. If each GPU could encode the 3 bytes of salt, and then all of them could encode the 32 bytes of the '1' common buffer holding that pre-computed MD5 value, and do it at the same time, then that would make the GPU code very fast. It would totally eliminate the buffer movement of that md5 hash X number of times. It 'is' a curious patch, and sort of does things a little different than the 'normal' JtR way of doing things. However, with your reply (and my reply to yours), it probably could be made quite a bit faster even on the CPU side, by doing some creative SSE coding. Only the first 4 bytes (per interleaved SSE buffer), would be need to be set independently (for the md5($s.md5($p) or md5(md5($p).$s formats, which are PHPS/MediaWiki), and a single 64 byte buffer would be 'shared' between all of the simultaneous SSE's. I think this would greatly reduce the memory movement, and should speed things up a bit, since 'almost' all of each of the buffers is the exact same data. Jim.
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux - Powered by OpenVZ