musl libc next-gen malloc implementation *** Motivation and Design Goals Desirable qualities of existing allocator: - Very small code size. - Near-zero constant minimal overhead, preventing large numbers of small processes with only minor use of malloc from consuming excess resources. - Low, well-understood internal fragmentation. - Returning of (physical storage for) large contiguous freed memory to the operating system. - Detection of heap-based overflows (with limitations) at realloc/free time via clobbered headers/footers. - Detection of double-free (with inherent limitations) via chunk header/flags. Known flaws in existing allocator: - Race condition (not data race, a distinct concept) in determining the availability of free chunk of desired size while splitting/merging is in progress leads to external fragmentation (unnecessary splitting of larger chunks) and over-expansion of heap (via failure to see large free zone in bin 63). - Reliance on fine-grained per-bin locking to achieve good performance on typical (not even all; it doesn't help when all threads want the same size of allocation) multithreaded loads fundamentally precludes fixing the above race without significant performance regressions. - The overall dlmalloc design (arbitrary split/merge) inherently has usage patterns that can produce catastrophic external fragmentation. - Heuristic for returning dirty pages to kernel (via MADV_DONTNEED) is over-aggressive in some cases (bad performance), under-aggressive in others (failure to return memory), and it's not clear that it can be significantly improved in a dlmalloc-like design. Together, these necessitate not just enhancements to the existing allocator, but a completely different design. Goals for new allocator: Primarily, the new allocator should eliminate the above known flaws while preserving the good properties of the existing allocator to the maximum extent that's possible/practical. In addition, it should: - Harden protections against heap-based overflows and double free not to be susceptible to attacks that blindly replace clobbered headers/footers with valid data. This likely involves out-of-band metadata and random secrets, but may be achieved in whatever way is determined to be most effective and compatible with other goals. - Provide either strong rigorous bounds on external fragmentation, or heuristically mitigate the chance of reasonable usage patterns leading to unbounded external fragmentation. - Expand or restructure the heap only when no free chunks that satisfy an allocation request (without significant internal fragmentation) exist. - Take advantage of realloc as an opportunity to defragment. - Improve performance, both under reasonable contention and under single-thread load (with or without existence of other threads). This likely amounts to short fast-case code paths and designing synchronization around minimizing the number of atomic/lock operations.