musl - Re: Deadlock in dynamic linker?

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20250527152007.GO1827@brightrain.aerifal.cx>
Date: Tue, 27 May 2025 11:20:07 -0400
From: Rich Felker <dalias@...c.org>
To: Markus Wichmann <nullplan@....net>
Cc: musl@...ts.openwall.com
Subject: Re: Deadlock in dynamic linker?

On Sat, May 24, 2025 at 07:45:45AM +0200, Markus Wichmann wrote:
> Hi all,
> 
> I have a question about the handling of shutting_down in the dynamic
> linker. Namely, I saw that do_init_fini() will go into an infinite wait
> loop if it is set. The idea was probably to park initializing threads
> while the system is shutting down, but can't this lead to a deadlock
> situation?

The idea is to prevent addition of any further ctors once dtors have
already started, since this may (?) make it difficult to ensure the
dtors are executed in reverse order of ctors, and since any added
after the entire list is processed would have their dtors skipped
entirely (see analogous logic in atexit).

> I'm thinking something like this: Thread A initializes liba.so. liba.so
> has initializers and finalizers, so thread A adds liba.so to the fini
> list before calling the initializers. The liba initializer calls
> dlopen("libb.so"). libb.so also has initializers.
> 
> While thread A is not holding the init_fini_lock, thread B calls exit().
> That progresses until __libc_exit_fini() sets shutting_down to 1. Then
> it tries to destroy all the libraries, but the loop stops when it comes
> to liba.
> 
> liba.so has a ctor_visitor, namely thread A, so thread B cannot advance.
> Thread A meanwhile is hanging in the infinite wait loop trying to
> initialize libb.so. The situation cannot change, and the process hangs
> indefinitely.

I see. In particular you're assuming the dlopen of libb happened after
the exit started.

> A simple way out of this pickle could be to add liba.so to the fini list
> only after it was initialized. That way, thread B cannot hang on it, or
> more generally, the finalizing thread cannot be halted by an incomplete
> initialization in another thread. This might change the order of nodes
> on the fini list, but only to account for dynamic dependencies. Isn't
> that a good thing?

No, I think it's non-conforming, and also unsafe, as it can result in
failure to run a dtor for something whose ctor already ran but did not
finish. This is a worse outcome than a deadlock in a situation that's
arguably undefined to begin with.

What might be acceptable, though, is moving the setting of
shutting_down to take place after the last dtor is peeled off the
list. However, this probably requires splitting shutting_down into two
variables, due to lock order issues. The value is needed under the
global ldso lock in dlopen() to make dlopen return with an error if
exit has already begun (this one should be kept before the dtor loop,
I think), and the value is needed in do_init_fini to block execution
of new ctors (this one should only take effect after all dtors have
been run).

Does that sound right?

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.