Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 23 Nov 2017 19:42:59 +0100
From: magnum <john.magnum@...hmail.com>
To: john-users@...ts.openwall.com
Subject: Re: OpenMPI and .rec files?

On 2017-11-23 19:09, Jeroen wrote:
> magnum wrote:
>> If you find out anything useful for others, please report here!
> 
> First finding that narrows things down:
> 
> - On the cluster, jobs seem to work fine with normal tasks (#.rec files = #jobs). It caps at 100 when --wordlist=<dict> is used.
> - I cannot reproduce this on the very basic lab setup (#.rec files = #jobs). However, the output is slightly different when a wordlist is used:
> 
> ---
> bofh@...ncher:/opt/JohnTheRipper/run$ rm *rec; mpirun -np 120 ./john --format=raw-md4 /tmp/hashes
> <SNAP>
> Loaded 1200 password hashes with no different salts (Raw-MD4 [MD4 128/128 SSE4.1 4x3])
> Using default input encoding: UTF-8
> Node numbers 1-120 of 120 (MPI)
> Send SIGUSR1 to mpirun for status
> ---
> 
> If the job uses a wordlist, "Send SIGUSR1 to mpirun for status" is missing.
> 
>> Oh BTW it's very strange you can resume such session with (seemingly) no
>> problems. If you didn't already, you should look very closely in the log
>> file and see if there are any clues there. Perhaps there is some error seen
>> there that we should handle (or report) better.
> 
> On the console everything looks normal:
> 
> Remaining X password hashes with X different salts
> Cost 1 (iteration count) is X for all loaded hashes
> MPI in use, disabling OMP (see doc/README.mpi)
> Node numbers 1-640 of 640 (MPI)
> <session status messages>
> 
> However, this message pops up in john.log for a number of workers: "Terminating on error, recovery.c:165".

In that case we do print an error to stderr also:

             fprintf(stderr, "Node %d@%s: Crash recovery file is"
                     " locked: %s\n", mpi_id + 1, mpi_name,
                     path_expand(rec_name));
             error();


Line 165 is the "error()".

So now we have two questions:
  1) Why was it already locked? Some half-dead process still running?
  2) Why do you not see the error printed to stderr? Something with your 
OpenMPI wrapper script?

magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.