Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 15 May 2020 20:29:13 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: mallocng progress and growth chart

On Sun, May 10, 2020 at 02:09:34PM -0400, Rich Felker wrote:
>  4668:   2x5440   2x5440   2x5440   2x5440   2x5440   5x4672   5x4672   5x4672   5x4672   5x4672   5x4672   7x4672   ...

This turns out to be just about the worst edge case we have, and in a
sense one that's fundamental. Sadly there are a number of
applications, including bash, that do a lot of malloc(4096). The ones
that just allocate and don't have any complex malloc/free patterns
will see somewhat higher usage with mallocng, and I don't think
there's any way around that. (Note: oldmalloc also has problems here
under certain patterns of alloc/free, due to bin_index vs bin_index_up
discrepancy!)

I have some changes I'm about to push that help this somewhat. The
2x5440 count-reduction (this is 3x with proper-fit count) is overly
costly at this size, and imposes a 12.5% waste on top of the slack
from coarse size classing and the base slack from mapping 4096 into a
4672 size class. Getting rid of it, and accounting for existing coarse
size class usage when doing the 7->5 reduction, produce:

 4668:   3x5440   3x5440   3x5440   3x5440   5x4672   7x4672   ...

which seems like about the best we can do. The initial allocation of
3x rather than 2x only uses one additional page to get an additional
slot that can be used before needing to mmap again, which is a big win
(essentially that third slot doesn't have any overhead) except in the
case where it's never used, and only a small loss (1 page) even then.

The same thing happens at the next doubling for malloc(8192), and the
same mitigation applies. However with that:

 9340:   3x10912  3x10912  3x10912  3x10912  3x9344   7x9344   ...

the coarse size classing is dubious because the size is sufficiently
large that a 7->3 count reduction can be used, with the same count the
coarsse class would have, but with a 28k rather than 32k mapping.

Unfortunately the decision here depends on knowing page size, which
isn't constant at the point where it needs to be made. For integration
with musl, page size is initially known even if it's variable, so we
could possibly make a decision not to use coarse sizing based on that
here, but standalone mallocng lacks that knowledge (page size isn't
known until after first alloc_meta). This might could be reworked.
There's a fairly small range of sizes that would benefit (larger ones
are better off with individual mmap because page size quickly becomes
"finer than" size classes), but the benefit seems fairly significant
(not wasting an extra 1.3k each for the first 12 malloc(8192)'s) at
the sizes where it helps.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.