john-dev - Re: descrypt speed

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CA+E3k91W3wmN+Obh6ZKtGYM_updgPM6mzQJjy5Bz=2zZ+td6dg@mail.gmail.com>
Date: Thu, 19 Feb 2015 22:02:04 -0900
From: Royce Williams <royce@...ho.org>
To: john-dev <john-dev@...ts.openwall.com>
Subject: Re: descrypt speed

On Thu, Feb 19, 2015 at 9:58 AM, magnum <john.magnum@...hmail.com> wrote:
> On 2015-02-19 08:30, Royce Williams wrote:
>> ... and fork=8 (more processes starved for CPU, but more aggregate throughput):
>>
>> 5 0g 0:00:02:20 0.00% 3/3 (ETA: 2016-08-08 08:33) 0g/s 18030Kp/s
>> 18030Kc/s 18030KC/s GPU:39°C fan:45% 2d2inl1n..2d2ottrd
>> 1 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-29 00:52) 0g/s 28015Kp/s
>> 28015Kc/s 28015KC/s GPU:46°C fan:45% 03-9be32..03alus42
>> 4 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-03-02 19:39) 0g/s 25572Kp/s
>> 25572Kc/s 25572KC/s GPU:32°C fan:45% plzzgm1...plp2b3sk
>> 3 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-23 10:53) 0g/s 28654Kp/s
>> 28654Kc/s 28654KC/s GPU:33°C fan:45% 8c9gt7i..8cci13k
>> 6 0g 0:00:02:20 0.00% 3/3 (ETA: 2016-09-10 02:55) 0g/s 16992Kp/s
>> 16992Kc/s 16992KC/s GPU:39°C fan:45% kmk14en8..kmher2a3
>> 7 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-01-26 16:56) 0g/s 28266Kp/s
>> 28266Kc/s 28266KC/s GPU:46°C fan:45% lhgeh730..l0nn0wow
>> 8 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-02-13 21:08) 0g/s 26841Kp/s
>> 26841Kc/s 26841KC/s GPU:41°C fan:45% cl1kiylu..clrh2bl1
>> 2 0g 0:00:02:30 0.00% 3/3 (ETA: 2016-03-02 11:01) 0g/s 25565Kp/s
>> 25565Kc/s 25565KC/s GPU:41°C fan:45% do_7af3..di7z7h8
>>
>> (Aggregate: 197935Kp/s)
>
> Unless I misunderstand, this does not make sense. If you overbook six
> GPU's using --fork=8, the two "extra" processes will each be pegged to a
> GPU, just like the first six. So it will end up in 2 GPUs running 2
> processes each, and 4 GPUs running one each. In case of CPU it would
> have worked fine (no affinity).
>
> Bottom line is you probably want to use either --fork=6 or --fork=12.

Agreed ... given equal PCI bandwidth. :-/  I jumped to the wrong
conclusion, due to forgotten PCI bandwidth differences on this system.
Four of my cards are connected to x16 slots with riser cables, and the
other two cards are connected to x1 slots.  (I hadn't thought about
this for a while because I've been mostly using hashcat for the past
couple of months.)

After your message, I was curious, so I ran new single-hash tests for
fork=6 through fork=13, each for at least ten minutes, and then added
up the aggregate speeds:

6: 168712 Kp/s
7: 181884 Kp/s
8: 195109 Kp/s
9: 205072 Kp/s
10: 210777 Kp/s
11: 188349 Kp/s
12: 176034 Kp/s
13: 197104 Kp/s
14: 198222 Kp/s
15: 187335 Kp/s

I had previously stopped at 8 because 9 showed no appreciable
improvement at the time for unknown reasons.  I naively presumed a
simple threshold and did not double-check.

So fork=10 is the sweet spot on my system - two processes for the four
x16 cards, and one process for the x1 cards.  More than that and
performance is never as good as at fork=10.  Thanks for helping me to
clarify.

Side question: Can I either tell JtR to automatically exit after X
minutes, or invoke --test against all GPUs?  Either would make tests
like these simpler.

Royce
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.