john-dev - RE: New JtR functionality, re-build lost salts

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <01f801cd28cc$eaa53f30$bfefbd90$@net>
Date: Wed, 2 May 2012 20:34:43 -0500
From: "jfoug" <jfoug@....net>
To: <john-dev@...ts.openwall.com>
Subject: RE: New JtR functionality, re-build lost salts

From: magnum [mailto:john.magnum@...hmail.com]
>> This modification to JtR will allow these (missing salts) to be 
>>found
>> (albeit, pretty slowly).
>
>This is a curious patch, I haven't had time to try it out but I will
>later.
>
>A thing that hits me is this is a task a fast GPU format like raw-md5
>could do very well, without the bandwidth problems it has otherwise...
>we just supply a fairly small buffer of words (perhaps just one word)
>and the GPU code generates all salts itself. But I guess it would need
>some support from the format interface.
>
>I suppose this patch as-is could be used with a slightly modified GPU
>format with less work, but then we'd have to transfer salts from CPU
>side. That is much lighter than transfering millions of keys though.

The way things work, is for each key, you run 'almost' the same crypt code,
X times, where X is the universe of salt's.  So for OSC, there are 95**2
runs for every candidate PW that many times.  The 'generation' of the salts
is highly trivial.  

For some formats (md5-6 and md5-9), the candidate needs to have MD5 code run
1 time, then ALL salts use the results of that.  I am not sure how easily
that would 'scale' to GPU code.  If each GPU could encode the 3 bytes of
salt, and then all of them could encode the 32 bytes of the '1' common
buffer holding that pre-computed MD5 value, and do it at the same time, then
that would make the GPU code very fast.  It would totally eliminate the
buffer movement of that md5 hash X number of times.

It 'is' a curious patch, and sort of does things a little different than the
'normal' JtR way of doing things.  However, with your reply (and my reply to
yours), it probably could be made quite a bit faster even on the CPU side,
by doing some creative SSE coding.  Only the first 4 bytes (per interleaved
SSE buffer), would be need to be set independently (for the md5($s.md5($p)
or md5(md5($p).$s formats, which are PHPS/MediaWiki), and a single 64 byte
buffer would be 'shared' between all of the simultaneous SSE's.  I think
this would greatly reduce the memory movement, and should speed things up a
bit, since 'almost' all of each of the buffers is the exact same data.

Jim.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.