![]() |
|
Message-ID: <20250527182614.GP1827@brightrain.aerifal.cx> Date: Tue, 27 May 2025 14:26:15 -0400 From: Rich Felker <dalias@...c.org> To: Markus Wichmann <nullplan@....net> Cc: musl@...ts.openwall.com Subject: Re: Deadlock in dynamic linker? On Tue, May 27, 2025 at 06:59:12PM +0200, Markus Wichmann wrote: > Am Tue, May 27, 2025 at 11:20:07AM -0400 schrieb Rich Felker: > > On Sat, May 24, 2025 at 07:45:45AM +0200, Markus Wichmann wrote: > > > I'm thinking something like this: Thread A initializes liba.so. liba.so > > > has initializers and finalizers, so thread A adds liba.so to the fini > > > list before calling the initializers. The liba initializer calls > > > dlopen("libb.so"). libb.so also has initializers. > > > > > > While thread A is not holding the init_fini_lock, thread B calls exit(). > > > That progresses until __libc_exit_fini() sets shutting_down to 1. Then > > > it tries to destroy all the libraries, but the loop stops when it comes > > > to liba. > > > > > > liba.so has a ctor_visitor, namely thread A, so thread B cannot advance. > > > Thread A meanwhile is hanging in the infinite wait loop trying to > > > initialize libb.so. The situation cannot change, and the process hangs > > > indefinitely. > > > > I see. In particular you're assuming the dlopen of libb happened after > > the exit started. > > > > I had completely neglected to look at the global ldso lock, actually. > But looking at it again, I am actually assuming that the dlopen() is > *starting* before the __libc_exit_fini() (so that thread B hangs waiting > for the lock), but that thread B then overtakes thread A between the > latter's release of the global lock and the taking of the init_fini_lock. > > This does mean that taking the init_fini_lock before releasing the > global lock would entirely prevent the issue. Not sure if that's > acceptable, though. > > > > A simple way out of this pickle could be to add liba.so to the fini list > > > only after it was initialized. That way, thread B cannot hang on it, or > > > more generally, the finalizing thread cannot be halted by an incomplete > > > initialization in another thread. This might change the order of nodes > > > on the fini list, but only to account for dynamic dependencies. Isn't > > > that a good thing? > > > > No, I think it's non-conforming, and also unsafe, as it can result in > > failure to run a dtor for something whose ctor already ran but did not > > finish. This is a worse outcome than a deadlock in a situation that's > > arguably undefined to begin with. > > > > But __libc_exit_fini() refuses to destroy libraries that haven't been > constructed completely. If p->constructed is zero, a node is skipped > even if it is on the fini list. And that flag is set in do_init_fini() > only after all constructors have returned. p->constructed being zero can only happen and mean "incompletely p->constructed" in the case where visitor is self (call to exit from p->your own ctor). It's not a condition you can encounter. from p->concurrency, since in that case you would not get past the condvar p->wait due to there being a visitor. > > What might be acceptable, though, is moving the setting of > > shutting_down to take place after the last dtor is peeled off the > > list. However, this probably requires splitting shutting_down into two > > variables, due to lock order issues. The value is needed under the > > global ldso lock in dlopen() to make dlopen return with an error if > > exit has already begun (this one should be kept before the dtor loop, > > I think), and the value is needed in do_init_fini to block execution > > of new ctors (this one should only take effect after all dtors have > > been run). > > > > Does that sound right? > > After __libc_exit_fini() has run its course, there is no need to record > anything, because it keeps the init_fini_lock. Anything that could query > shutting_down at that point would already hang taking that lock. Indeed, that sounds right. However I'm beginning to doubt this solution really works. The problem is that, if A depends on B, A's ctor can be waiting to run pending B's ctor finishing. But if a concurrent exit calls B's dtor, then A's ctor is no longer free to run, because it depends on B being constructed. We really do need to block execution of any ctor whose deps may already have been destructed. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.