john-users - Re: Re: beta MPI john patch against 1.7.2.

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20060601234313.GA11394@openwall.com>
Date: Fri, 2 Jun 2006 03:43:13 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: Re: beta MPI john patch against 1.7.2.

I'll respond to several messages at once:

On Thu, Jun 01, 2006 at 06:57:56PM +0200, Michal Luczaj wrote:
> (let's blame mail system! ;)

Yes, the mail system is to blame.  I had this list configured with some
MIME type filters and those were too strict.  I've shortened the list of
disallowed MIME types now.

More importantly, with the appropriate permissions from the authors, I
have placed all four John/MPI hacks into the contrib/ directory:

	ftp://ftp.openwall.com/pub/projects/john/contrib/mpi/

On Thu, Jun 01, 2006 at 05:34:45AM +0200, Otheus (aka Timothy J. Shelling) wrote:
> Please let me know of any rejects while performing the patch. (Use -l to
> ignore whitespace to avoid problems with bench.c.)

Yes, there were rejects on the Makefile, even with "patch -l".  I had to
fix those manually and re-generate the patch before placing it for
download.  Please use the following command for generating patches for
subsequent revisions of your code:

	TZ=UTC diff -urp john-1.7.2 john-1.7.2-mpi > john-1.7.2-mpi-tjs-2.diff

The "-2" will be the patch revision number - you'll increment it each
time you generate a new patch for the same base version of John.  If you
add any new files, you'll need to add the "-N" option to diff as well.

Also, your john.conf has lots of unrelated changes.  It is a good idea
to revert those.

The same suggestions apply to any other patches I place in contrib/ -
which is why I am posting them to the mailing list.

> Known bugs:
>  o   john.pot gets littered with extraneous output, usually low integers
> like "4" or "1" and a newline. Very strange. Still trying to track it down.

My previous guess (from the private e-mail) proves correct - you do in
fact close stdout and stderr in your patch.  While you try to re-open
stderr, the way you do it is not guaranteed to work right.  And I don't
see you re-opening stdout at all.  So those numbers are some debugging
output being printed to either stdout or stderr - perhaps they are
MPI_rank's?

>  o  internalized computing the checksum for ext_word. 15% increase in speed
> (at least)

>  o   use of external filter with internal checksum causes a slight (1-2%)
> slowdown

You can't reasonably speak of speedups/slowdowns by a certain percentage
unless you mention the hash type and the number of different salts for
which you did your benchmarks.

On Wed, May 31, 2006 at 10:53:07PM +0200, Otheus (aka Timothy J. Shelling) wrote:
> It occured to me, just before I looked at the "Pippin" MPI project, that in
> both Ryan's implementation and in mine, once a particular hash is cracked by
> a particular task, the other tasks are *still* comparing against the cracked
> passwords. They are also generating hashes from salts that may be no longer
> needed.
[...]
> Question: Where/how should the other tasks handle this event?  I'm somewhat
> lost on this, but my current guess is that for each cracked ciphertext
> received via MPI, the task must find the right salt list and pw entry can
> call crk_process_guess(). If there's a cleaner way, I'd love to know.

Yes, crk_process_guess() is what you need.  However, you'll need to
modify it such that it accepts a boolean flag indicating whether to call
log_guess().  The existing calls to crk_process_guess() would continue
to log, whereas your added call, invoked on request from another task,
would merely remove the cracked hash/salt.

> Question: Where/how should the cracking task broadcast this event? It seems
> pretty clear that this should be in crk_process_guess(). Of course, if my
> guess above is true, the crk_process_guess must be called for every cracked
> password received from the other tasks, then this would be split into two
> functions right after the log() call.

Actually, the added "if" statement would be right before the log_guess()
call, as I have suggested above.  You would need to broadcast the event
right after the log_guess() - but still within the "if" block.

> Alternatively, I could add a boolean
> parameter to the function which would be use to skip re-broadcasting and
> re-logging the cracked password.

Exactly.

On Wed, May 31, 2006 at 11:30:14PM +0200, Otheus (aka Timothy J. Shelling) wrote:
> On synchronizing crypts/salts between all tasks, another thought occurred to
> me.  After every checkpoint, simply load in the john.pot file (assuming it's
> not corrupted ;) ). Would that be a simpler, (though slightly less optimal),
> approach?

Yes, that would work - as long as you have that file shared between the
tasks anyway.

On Tue, May 30, 2006 at 02:28:08AM +0200, Otheus (aka Timothy J. Shelling) wrote:
> >>   o  Recovery: Each task runs with its own session; the whole thing can
> >> be re-started in recovery mode
> >
> >OK, that can work - but does it mean that there's a .rec file for each
> >task?
>
> Yes.  With more work, I could set it up so the root task keeps track of one
> record file, but I'd need to understand your restore format/logic very very
> well.

Well, it'd be really tricky to have a single .rec file given the way you
split the keyspace - unless that file would essentially combine the
information from the current .rec files of each task - making the file
fairly large.  Unfortunately, John currently rewrites .rec files
in-place, so if the file becomes larger than a filesystem block it is
more likely to get corrupted.

> Well, cluster programs generally assume equal-performing nodes.. It's rare
> that they don't. At any rate, if you have a 10 cpu cluster, and half the
> cpus are, let's say, only 66% the speed of the other half, then you would
> assign particular tasks to particular node names (using a carefully
> constructed machines files) and modify the external filter so that  the
> faster cpu/tasks get 33% more keys than the other. The filter logic would be
> a tad bit more complex, but would only amount to a few cycles per key per
> task.

That's too tricky to setup for an end user - even for an experienced
one.  Besides, the performance of John might differ between tasks even
on the same hardware due to different placement in caches and different
physical page allocations.  I've been trying to reduce cache aliasing
conflicts in John by placing all frequently accessed data in sequential
structs - and this works well most of the time - but there are still
cases when performance is not stable.  This mostly affects CPUs with
direct-mapped caches such as Alphas and fast hashes such as LM ones.

> > What happens if a node fails?
>
> If MPI is working properly, the whole job fails, and you would restart the
> job in recovery mode.

So that job would be somewhat behind the others - resulting in candidate
passwords being tried in a non-optimal order.

> If MPI implementation is buggy (like most are), you
> could implement some kind of MPI_Barrier in the sig-alarm handler so that if
> a node fails, everything stops.

That's not great.

> > What if you want to add/remove nodes?
>
> Removing nodes: I'm not sure ... I'd want to make sure that the removed
> nodes weren't working on keys that no longer get generated after a restore.

That's not really compatible with the way you split the keyspace and let
each node proceed further into the keyspace on its own.

> Adding nodes:  I'd have to look at your restore file layout/logic to see if
> restore files could be cloned as dummys for the new nodes. A restore, but
> with more nodes, would work fine, since the external filter use the current
> MPI job's Rank and Size.

I don't think that hacking the .rec files is the right way to do this,
and I also don't think that you can do it optimally.  Yes, you can split
a node's remaining keyspace in two and assign a half of it to the added
node.  But that would complicate the filter() and result in a portion of
the keyspace being searched twice quicker than the rest of it - so the
ordering of candidate passwords once again becomes non-optimal.

Basically, you should be splitting the keyspace differently in order to
implement this functionality.

Speaking of the overhead:

> >The impact is minor with slow hashes and multiple salts.  The impact is
> >major with fast saltless hashes.

> What's an example of a fast, saltless hash? I'll benchmark it.

LM hashes are good for this.

-- 
Alexander Peslyak <solar at openwall.com>
GPG key ID: B35D3598  fp: 6429 0D7E F130 C13E C929  6447 73C3 A290 B35D 3598
http://www.openwall.com - bringing security into open computing environments

Was I helpful?  Please give your feedback here: http://rate.affero.net/solar
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.