john-users - Re: --fork using different OpenCL devices

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20130808033011.GA18841@openwall.com>
Date: Thu, 8 Aug 2013 07:30:11 +0400
From: Solar Designer <solar@...nwall.com>
To: john-users@...ts.openwall.com
Subject: Re: --fork using different OpenCL devices

magnum, Claudio, all -

On Wed, Aug 07, 2013 at 09:32:49PM +0200, magnum wrote:
> Claudio had an idea a while ago that I think still hasn't been discussed on list so here goes:
> 
> The idea is to have -fork pick a different device (starting from 0 or picking from a given list) for each child. Picture having two 7990 cards for a total of four devices. Using "-fork=4" with an OpenCL format would pick device 0 for the mother process, device 1 for first child and so on.

This would provide poor man's multi-GPU support.  Unfortunately, in the
current implementation of --fork there's some use of signals - such as
to get the status line printed by all children on a keypress - and this
appears incompatible with AMD's SDK.

> Only very fast formats [where set_key() is a bottleneck] would benefit.

This is confused/confusing.  What I think was meant here is that if we
_don't_ direct the different fork'ed processes to different GPUs (let
them all use one GPU), then we'll hide the latency of key setup and key
transfers.  This is similar to how I sometimes invoke Sayantan's
descrypt-opencl on one GPU multiple times to achieve much better
cumulative speed than is possible with one invocation.  Yes, --fork
would help here (already the current implementation of it, with no
changes), except that there's the issue with AMD's SDK that I mentioned
above.  On NVIDIA GPUs, this just works.

> I think it's a cool idea and Claudio has a trivial PoC patch. Should we do this? It will hopefully be obsoleted by mask mode and other planned things. OTOH I would not mind at all applying it.

I don't see mask mode as obsoleting it.

I don't recall what exactly Claudio's patch did, though.  Like I said,
for hiding the latencies for key setup/transfers with fast hashes, no
patch is needed (but there's an issue with AMD SDK, which we have no
patch for).  However, for poor man's multi-GPU a patch would in fact be
needed (but it will similarly be problematic with AMD SDK).

Maybe we should revise --fork such that it would not use signals (would
use solely other IPC mechanisms).  Or maybe AMD will fix their SDK soon
(wishful thinking).

Alexander

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.