Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 30 Jul 2018 21:44:07 -0400
From: Rich Felker <>
Subject: Re: malloc implementation survey: omalloc

On Mon, Jul 30, 2018 at 08:47:28PM -0400, Rich Felker wrote:
> On Sun, Jul 29, 2018 at 09:26:18PM +0200, Markus Wichmann wrote:
> > Hi all,
> > 
> > we discussed rewriting malloc() a while back, because, as I recall, Rich
> > wasn't satisfied with the internal storage the current system is using
> > (i.e. metadata is stored with the returned pointer) as well as some
> > corner cases on lock contention, although fine grained locking is a nice
> > feature in itself.
> > 
> > I therefore had a look at existing malloc() algorithms, the rationale
> > being that I thought malloc() to be a solved problem, so we only have to
> > find the right solution.
> > 
> > As it turns out, it appears Doug Lea was simply too successful: Many
> > allocators follow his pattern in one way or another. But some systems
> > buck the trend.
> > 
> > So today I found omalloc, the allocator OpenBSD uses, in this nice
> > repository:
> > 
> >
> I haven't looked at it in a couple years, but last I did, the OpenBSD
> allocator was not practical. It used individual mmaps for even
> moderately large allocations, and used guard pages with each, which
> with the default Linux VMA limits puts an upper bound of 32768 on the
> number of non-tiny (roughly, larger-than-page IIRC) allocations.
> It does have much stronger hardening against overflows than musl's
> current malloc or any other allocator, but it seemed inferior in all
> other ways.
> I'll read the rest of your description later and see if there's
> anything new that's interesting.

I've now read your description and it seems to agree with what I
remember, except maybe replacing "individual mmaps" by some reuse of
regions, and the guard page thing likely being optional (or no longer
used because it was too inefficent?).

Some ideas like treating different size ranges differently might be
worthwhile (this should make a big impact limiting worst-case
fragmentation). For instance I could see having all allocations below
a certain size have to come from "pages" of 512 bytes, broken into
16-byte runs with a 32-bit atomic bitmap of allocated runs, and
likewise for a few more size ranges. This is the kind of thing I've
been thinking about for a while, but I haven't come up with any good
approaches to estimating the cons.

One thing that would probably be really useful is empirical data on
what range of sizes "need to be fast" for bloated OO stuff that does
rapid malloc/free cycles. Conceptually it should only be small sizes,
assuming the whole allocated space is actually used, since beyond a
certain size, actual use of the allocated memory (loads/stores) will
dominate the time spent allocating and freeing it, but the cutoff
isn't clear.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.