Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a1e54853882801836392b76324f5434d@smtp.hushmail.com>
Date: Thu, 8 May 2025 00:04:55 +0200
From: magnum <magnumripper@...hmail.com>
To: john-users@...ts.openwall.com
Subject: Re: sudden shutdown by overheat

On 2025-05-07 14:47, SHINODA, Daisuke wrote:
> As you correctly pointed out, I suspected that the GPU fan speed might 
> be insufficient. Therefore, I installed the rocm-smi application and set 
> the GPU fan speed to 100%. As a result, JtR no longer shuts down. 
> However, the fan noise at 100% was extremely loud, so I reduced the 
> speed to 80%. Despite the noise still being quite loud, I noticed that 
> the system shuts down after approximately three hours at this setting.

I can't see why the firmware & driver fails to automagically adjust the 
fan as needed so it would be quiet when not under high load and run fans 
at full speed as needed. But they never did that properly, and same with 
nvidia. Recent nvidias seem to rather throttle the clock than increasing 
fan speed. It's somewhat configurable though and at least they don't 
overheat.

> Regarding the ADL (AMD Display Library) you mentioned, I extracted 
> libatiadlxx.so from an older AMD Radeon driver, added it to my Linux 
> system, and ran the ldconfig command. I then recompiled JtR, but 
> unfortunately, this resulted in no noticeable changes. After checking 
> the GPUOpen website, it seems that ADL and ADLX are now only available 
> for Windows, with no support for Linux. Even if libatiadlxx.so were 
> functional, it may only be retrieving the GPU Die-Edge temperature. The 
> temperature approaching 120°C is the GPU junction temperature, while the 
> GPU Die-Edge temperature does not exceed 80°C.

This is interesting information. I will try to delve into it at some 
point in time. As for die-edge vs. junction I guess it's mostly a matter 
of selecting a die edge temperature that should mean a safe junction 
temperature - the main problem is we currently see neither.

> To address this issue, I created a bash script, which monitors the rocm- 
> smi output and temporarily pauses JtR if the GPU junction temperature 
> exceeds 115°C, resuming it when the temperature drops to 80°C. Using 
> this script has successfully prevented shutdowns. I have made this 
> script available at the following URL:
> https://github.com/dailikessushi/john_manager.sh/blob/main/john_manager.sh

I must admit I never realized we could use SIGSTOP and SIGCONT like 
that. FWIW we also have this default in john.conf:

   # While this file exists, john will pause (uncomment to enable)
   #PauseFile = /var/run/john/pause

Uncomment the second line, and perhaps change the path (you can use a 
path like $JOHN/abort for using the directory where eg. john.conf is 
located without using full path). Then a script (or some user) can just 
create an empty file named "pause" in said path, and in a couple of 
seconds John will pause until the file is deleted. This method does not 
send the john process to the background like SIGSTOP does. Also, the 
user who wants to pause doesn't need to own the john process, write 
access to the pause file directory is enough.

> It appears that recent AMD Radeon graphics cards can monitor 
> temperatures using applications like sensors or rocm-smi, both of which 
> can detect the GPU junction temperature. If the developers of JtR wish 
> to improve compatibility with modern AMD Radeon graphics cards, it may 
> be beneficial to utilize these applications.

This too is good info. My biggest problem with fixing these issues is I 
don't have any AMD device, and Openwall's test rig have an AMD card and 
runtime from 2019 or so.

Thanks,
magnum

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.