Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 18 Aug 2011 10:57:49 +0200
From: magnum <>
Subject: Re: MPI / OpenMPI

On 2011-08-18 10:32, Solar Designer wrote:
> On Thu, Aug 18, 2011 at 08:00:28AM +0000, Donovan wrote:
>> Here another ex; i got 20 mn a go, for just illustrated my says above.
> [...]
>> breepart         (?)
>> ^Cmpirun: killing job...
>> mpirun noticed that job rank 0 with PID 1290 on node new-host-xxxxxx exited on
>> signal 15 (Terminated).
>> 3 additional processes aborted (not shown)
>> xxxxxxx:run xxxx$ ./john --format=oracle11 --show or.txt
>> 0 password hashes cracked, 3895 left
> This looks like a bug, then - assuming that you were in fact cracking
> oracle11 hashes found in or.txt and not something else.  "ls -l john.pot"
> before and after the cracking run would be more convincing.  Also, you
> could try lowering the "Save" setting in john.conf to something like 10
> seconds.

Not really a bug, it's more of a missing workaround. It's a known 
problem, I think it's documented too along with the Save workaround. For 
some reason, mpirun terminates its child processes very brutally and I 
haven't managed to work around it. I have no idea how to fix it other 
than lowering the Save parameter. This happens using MPICH2 or OpenMPI.

> Well, I guess we can live with the fact that MPI support is an unreliable
> feature in -jumbo, although I don't mind accepting patches for it.  Like
> I said, personally I don't use it and don't recommend it.

Once you get used to a couple of oddities like this (do not kill a job 
without first sending a SIGUSR1 to the parent mpirun) it's pretty 
reliable. But I too would love seeing it replaced with something better.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.