Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250904181428.GP1827@brightrain.aerifal.cx>
Date: Thu, 4 Sep 2025 14:14:28 -0400
From: Rich Felker <dalias@...c.org>
To: Markus Wichmann <nullplan@....net>
Cc: musl@...ts.openwall.com, Greg McPherran <gm@...herranweb.com>
Subject: Re: MUSL - malloc - Multithread Suggestion

On Thu, Sep 04, 2025 at 07:29:50PM +0200, Markus Wichmann wrote:
> Am Thu, Sep 04, 2025 at 07:33:10AM -0400 schrieb Greg McPherran:
> > Hi, perhaps a thread_local (e.g. C23) memory pool would separate
> > malloc cleanly for each thread, with no performance issue, e.g. mutex
> > etc? 
> 
> The idea is well-known, and a library that implements can be found under
> the name tcmalloc.
> 
> The thread_local keyword, however, cannot be used for a few reasons. One
> is general: You must keep a thread-local arena alive for as long as it
> has active allocations, since in C, one thread can allocate an object
> and give it to another thread. So automatic deallocation when the thread
> ends is out of the question.
> 
> The other is technical: The keyword is basically implemented on Linux
> the same as the __thread extension keyword, which in the end uses ELF
> TLS. But musl cannot at the moment use ELF TLS for itself in dynamic
> linking mode because the dynamic loader is not set up for that. Fixing
> that would require making the stage 2 relocation of the linker skip TLS
> relocations as well, and also not using any code that uses TLS until the
> stage 3 relocation has happened. Putting it in the allocator is
> therefore not an option at all, since we need an allocator to get to the
> stage 3 relocation.

This is all technical details that really have no bearing on whether
or not it could be done, just how it would be done if it were. The
only reason TLS can't be used inside musl is because we don't have a
need for it and lack of need has informed lack of implementation. If
our malloc needed thread-local state it would just be a pointer inside
our struct __pthread. No need to make a meal of this; it's not the
issue at hand.

> I also must correct one thing: Due to the allocations being able to be
> shared, even with thread-local arenas the implementation needs locking,
> since other threads might be concurrently freeing an object in the same
> arena. But there should be less contention, yes.

Yes, there is a fundamental need for synchronization unless free is a
no-op. But beyond that, amount of synchronization is a tradeoff
between performance and one or both of memory consumption and
hardening. At a very basic level, if thread B doesn't know thread A
has space that's been freed that it can use, thread B is going to have
to go get its own, and this badness scales rapidly with the number of
threads, especially when you're allocating moderate to large numbers
of slots of the same size together which is necessary to avoid bad
fragmentation cases.

One way the inefficiency of having entire per-thread areans can be
mitigated is the glibc fastbins approach -- not actually needing
separate heaps, but just keeping recently-freed memory slots an
intrinsic linked list for the thread to reuse. This is a huge
hardening/security tradeoff, because these list pointers can be
corrupted following most UAF and heap-based overflow bugs to seize
control of execution flow.

The mallocng allocator was designed to favor very low memory overhead,
low worst-case fragmentation cost, and strong hardening over
performance. This is because it's much easier and safer to opt in to
using a performance-oriented allocator for the few applications that
are doing ridiculous things with malloc to make it a performance
bottleneck than to opt out of trading safety for performance in every
basic system utility that doesn't hammer malloc.

mallocng's design is such that, in theory, it could use thread-local
active groups for each size class, potentially even activating that
behavior only when there's high contention on a given one. This
possibility has not been explored in depth, and it's not clear what
the gains would be with free still being synchronized (which is an
atomic+barriers, not a lock).

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.