john-users - Re: dmg-opencl low performance/ low gpu utilisation

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <297f80d4-3471-f97e-ffda-97787ca0ca79@web.de>
Date: Fri, 4 Dec 2020 13:13:57 +0100
From: r.wiesbach@....de
To: john-users@...ts.openwall.com, Solar Designer <solar@...nwall.com>
Subject: Re: dmg-opencl low performance/ low gpu utilisation


>> I use dmg-opencl on a two Radeon RX 580 system.
>>
>> However the dmg-opencl has very low utilisation
> How low?  And how do you measure it?
Windows 10 Task Manager GPU utilization. GPU1 has about 3% utilization
on average, GPU2 about 0.5%
>> and a speed of only about 2500 pw/s.
> This may be a fine speed.  It depends on performance of the system the
> dmg file or sparsebundle was created on - the faster that system was,
> the slower the file or sparsebundle will be to crack.  This is because
> recent versions of macOS tune the time needed to generate the encryption
> key from the password to be roughly the same - whatever the developers
> thought the user would not be too comfortable with.

Interesting, so that is why the iteration count differs. The highest
iteration count of the samples is about 370,000 iterations according to
dmg2john output, the lowest about 100,000.

> That said, "46500 c/s real" does sound low, and "3300 c/s virtual"
> weird (the virtual is generally the same or higher than real for this
> test, because only one CPU thread is run and the virtual time runs
> slower than real).  Here's what I am getting for a Vega 64 under Linux:
>
> Device 1: gfx900 [Radeon RX Vega]
> Benchmarking: dmg-opencl, Apple DMG [PBKDF2-SHA1 3DES/AES OpenCL]... LWS=64 GWS=32768 (512 blocks) DONE
> Speed for cost 1 (iteration count) of 1000, cost 2 (version) of 2 and 1
> Raw:	875976 c/s real, 11796K c/s virtual
>
"LWS=64 GWS=32768 (512 blocks)" or similar is not shown in the output on my system:
Benchmarking: dmg-opencl, Apple DMG [PBKDF2-SHA1 3DES/AES OpenCL]... DONE

and I do not see a verbose parameter in the manual.

Disclaimer: I currently use the 1.9.0-jumbo-1 release. I did not try a win64 dev build yet.

> When you run with two devices, there should be two status lines printed
I noticed that independently this morning. Sorry, stupid me.
> We have no OpenCL kernels that would perform better with rules.  We do
> have some that will perform better with mask, but those are for
> so-called "fast hashes".  dmg-opencl is (more than) slow enough not to
> need this (and thus doesn't include this unneeded optimization).
Interesting. I thought wordlist rules were a specific form/subclass of
masks.
>
> It is quite possible there's an issue - maybe a Windows-specific one,
> maybe e.g. with auto-tuning of OpenCL work sizes - just guessing here.
> Let's see what you actually have (some lines where JtR reports on the
> loaded hashes, their tunable costs, the tuned LWS and GWS figures, the
> resulting speeds after it's been running for a while) - then determine
> if anything is wrong with that, what exactly, and how it can be fixed.
iterations 100K-370K, version all 2
5 hash version has been running over night now, speeds are
513p 2568c 2568C
630p 3155c 3155C
(No LWS and GWS as written above)

Again disclaimer: I currently use the 1.9.0-jumbo-1 release. I did not
try a win64 dev build yet.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.