Date: Fri, 15 May 2020 23:29:01 -0400
From: Rich Felker <>
Subject: Re: mallocng progress and growth chart

On Fri, May 15, 2020 at 08:29:13PM -0400, Rich Felker wrote:
> The same thing happens at the next doubling for malloc(8192), and the
> same mitigation applies. However with that:
>  9340:   3x10912  3x10912  3x10912  3x10912  3x9344   7x9344   ...
> the coarse size classing is dubious because the size is sufficiently
> large that a 7->3 count reduction can be used, with the same count the
> coarsse class would have, but with a 28k rather than 32k mapping.
> Unfortunately the decision here depends on knowing page size, which
> isn't constant at the point where it needs to be made. For integration
> with musl, page size is initially known even if it's variable, so we
> could possibly make a decision not to use coarse sizing based on that
> here, but standalone mallocng lacks that knowledge (page size isn't
> known until after first alloc_meta). This might could be reworked.
> There's a fairly small range of sizes that would benefit (larger ones
> are better off with individual mmap because page size quickly becomes
> "finer than" size classes), but the benefit seems fairly significant
> (not wasting an extra 1.3k each for the first 12 malloc(8192)'s) at
> the sizes where it helps.

It might just make sense to always disable coarse size classing
starting at this size. The absolute amount of over-allocation is
sufficiently high that it's probably not justified. On archs with
larger page size, more pages may be mapped (e.g. 7x9344 can't be
reduced to 3x if page size is over 4k, and can't be reduced to 5x if
page size is over 16k) but having too much memory mapped is generally
an expected consequence of ridiculous page sizes.

Another possibly horrible idea for dealing with exact page sizes at
low usage: pair groups of short length. Rather than needing to choose
between coarse classing 3x5440 (16k) or a 5x4672 (5 whole slots, 24k)
for a malloc(4095), at low useage we could create a pseudo-2x4672
where the second slot is truncated and contains a nested 1x3264 that's
only freeable together with the outer group. This relation always
works: if size class k is just under a power of two (k == 3 mod 4),
classes k+1 and k-1 add up to just under 2 times class k. (This
follows from n/5 + 2n/7 == (7n+10n)/35 == 17n/35 <= n/2 == 2*n/4,
where n/5, 2n/7, and n/4 are the sizes of class k-1, k+1, and k,

This gives a strategy that always works for allocating very-low-count
of arbitrary size classes, as long as we're willing to allocate a slot
for the complementary size at the same time. And in some ways it's
nicer than coarse classing -- rather than overallocating the requested
slot in hopes that the grouped slots will be useful for larger
allocations too, it allocates a smaller complementary pair in hopes
that the complementary size will be useful.

