musl - Re: New malloc - first preview

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191129135523.GP16318@brightrain.aerifal.cx>
Date: Fri, 29 Nov 2019 08:55:23 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: New malloc - first preview

On Fri, Nov 29, 2019 at 09:37:15AM +0300, Vasya Boytsov wrote:
> Is your malloc implementation any better than
> https://github.com/microsoft/mimalloc?
> AFAIK one can use mimalloc in their projects as it has compatible license.
> the only thing that bothers is that it is done by Microsoft research.

"Better" is not meaningful without specifying criteria for evaluation.
The information on mimalloc is sparse, not covering the basic design
it uses AFAICT, rather mainly the new properties it has which are
mostly for the purpose of its specialized sharding API (not
useful/usable for malloc API). So without reading into it much more,
it's hard to say.

>From the summary, their idea of "small" is on a completely different
order of magnitude from musl's. 6kLOC is gigantic, far larger than any
single component of musl. Of course it might be really sparse code
with lots of whitespace and comments, but still doesn't bode well.

Regarding "fast", everyone aiming for "fast" claims fast with their
own benchmarks. I generally don't trust malloc benchmarks after N
times of digging into them and finding they were tuned just
above/below a threshold to make a competitor look bad. That's not to
say the same happened here, but "fast at what?" is a real question
here, and the only meaningful test seems to be measurement under
real-world workloads.

The security measures mentioned seem weaker than WIP new malloc and
likely not much better than current one. But it's hard to evaluate
security/hardening measures without a model of proposed attacks and
which ones they mitigate. This is one thing I've focused on in the new
malloc, classifying and addressing them such that we can rule out
entire classes. Roughly, without already having complex capability to
follow multiple pointers and clobber resulting memory, only the
application data stored in the heap, not the allocator's heap state,
is subject to corruption by common methods (linear overflow, UAF, DF).
Not all of this is implemented yet in the draft.

Those things aside, ultimately the answer is that the question is not
what is the best malloc for some arbitrary definition of best, but
what's the best that fits into the constraints of musl, including:

- low absolute space cost at low malloc utilization
- low relative space cost at moderate to high utilization
- compatibility with 32-bit address space
- compatibility with nommu
- compatibility with RLIMIT_AS, no-overcommit configurations

and also provides hardening improvements over the current allocator,
which gets by with only minimal measures.

Nobody should be expecting magical performance improvements out of
this. The likely user-facing changes will be:

- elimination of heap size explosion in multithreaded apps (known bug)
- better return of freed memory to system
- greatly reduced exploitability of bugs related to malloc usage

Speed could go up or down. Hopefully it goes up for lots of things. It
will be a "win" in terms of speed if it merely avoids going down by
more than fixing the "heap size explosion" bug would make it go down
with the current dlmalloc-type design -- see:

https://www.openwall.com/lists/musl/2019/04/12/4

since fixing this bug is absolutely necessary; it keeps impacting
users.

Rich



> On 11/29/19, Rich Felker <dalias@...c.org> wrote:
> > On Thu, Nov 28, 2019 at 04:56:42PM -0500, Rich Felker wrote:
> >> Work on the new malloc is well underway, and I have a draft version
> >> now public at:
> >>
> >> https://github.com/richfelker/mallocng-draft
> >>
> >> Some highlights:
> >>
> >> - Smallest size-class now has 12 of 16 bytes usable for storage,
> >>   compared to 8 of 16 (32-bit archs) or 16 of 32 (64-bit archs) with
> >>   the old malloc, plus 1.5 bytes (32-bit) or 2.5 bytes (64-bit) of
> >>   out-of-band metadata. This overhead (5.5 or 6.5 bytes per
> >>   allocation) is uniform across all sizes.
> >
> > Make that 6 or 7 since there's also a 16-byte (counting alignment,
> > which is most of it) group header that imposes 0.5 bytes of overhead
> > per slot for a full-length 32-slot group.
> >
> > Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.