|
|
Message-ID: <CAHktk4jWZgOyPA+a2R-oCN4ae6ESf9vwM3axmZ=6KDbKb3HD4A@mail.gmail.com>
Date: Mon, 20 Apr 2026 22:20:50 -0700
From: Charles Munger <clm@...gle.com>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: Request for feedback on WG14 proposal N3849 (alloc_at_least)
On Mon, Apr 20, 2026 at 8:04 PM Rich Felker <dalias@...c.org> wrote:
> On Mon, Apr 20, 2026 at 07:21:13PM -0700, Charles Munger wrote:
> > On Mon, Apr 20, 2026, 4:56 PM Rich Felker <dalias@...c.org> wrote:
> >
> > > On Mon, Apr 20, 2026 at 03:11:38PM -0700, Charles Munger wrote:
> > > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3849.pdf
> > > >
> > > > On the WG14 mailing list it was raised that musl had objections to
> this
> > > > proposal and others dealing with exposing allocator sizes. I hope
> that I
> > > > and my coauthors wrote this in a way that would be unobjectionable to
> > > > allocator/libc maintainers, but if this will cause you problems I
> would
> > > > like to know, and welcome any and all of your feedback.
> > >
> > > I'm not sure what specific objections you're referring to. I have not
> > > seen concrete evidence that any of this has any practical value. I
> > > raised that objection early in the discussion of this whole topic, and
> > > I don't think anyone ever presented measurements that support the need
> > > for any action whatsoever here. In the absence of that, I do not think
> > > anything like this should be adopted.
> > >
> > I was motivated to propose this after an investigation where I found that
> > pathological allocation sizes were causing big performance problems in
> some
> > software I was working on (upb). I looked into using malloc_usable_size,
> > but that presented further problems with lookup performance costs, the
> need
> > to realloc, and more. After this investigation I reached out to the
> sqlite
> > authors, who were using malloc_usable_size in their internal allocator.
> > They did some profiling and determined that sqlite was faster with the
> > extra memory from malloc_usable_size, but that the cost of
> > malloc_usable_size itself undermined it, especially when there was no
> > additional memory to be gained. I also found that the scudo allocator was
> > intentionally tracking and returning the requested size rather than the
> > actual size to work around this sqlite behavior; so at least running on
> > platforms with the scudo allocator, they were paying the cost for no real
> > benefit.
> >
> > The specific problem case occurred when I needed to allocate something
> > while parsing; unfortunately it was just slightly over half of of the
> > nearest size class in the underlying allocator, so nearly half the
> > allocated space was wasted.
>
> This sounds like an unusually memory-wasting allocator if size classes
> are so widely spaced that you can end up wasting nearly half of the
> space. Based on the allocators I've seen, I think it's much more
> typical to have 2-4 or even more classes per doubling of size, than
> just 1.
>
In real benchmarks across a couple of allocators on the platforms this code
was targeting (glibc, scudo, jemalloc, tcmalloc) I was able to reproduce
substantial amounts of wasted memory; even in the "good" cases it was
common to see 10% or more wasted space. The benefit isn't just that we get
an extra 10% or so available in the presence of granular size classes,
using that extra memory often allowed us to avoid allocating another whole
block. Here's the results from one run:
name old peak-mem(Bytes)/op new
peak-mem(Bytes)/op delta
BM_Parse_Upb_FileDesc<UseArena, Alias> 43.7k ± 0% 30.4k ±
0% -30.32% (p=0.000 n=60+60)
> > It happened to me again recently when I found
> > that something was requesting a power-of-2 sized allocation, which an
> > intermediate allocator was adding a small header to, and that header made
> > it just barely larger than the normal size class, resulting in bumping up
> > to the next size and wasting a bunch of memory. In both cases I had a
> > productive use for that memory (first was a bump allocator; second was
> > serializing into a buffer of bytes).
>
> This kind of "I hit this problem that could be solved by adding stuff
> to the core language's library" does not scale. If everybody did that,
> we would have an unmaintainable mess. A reasonable motivation for
> something like this would be "a large class of major software in
> real-world deployment is adversely affected by this issue, and there
> is fundamentally no easy way to mitigate it without an extension". It
> should have real-world data, not anecdotes.
>
I certainly agree that individual frustrations are not a good reason alone
to add to the standard. But they can be motivating cases - the fact that so
many libc implementations expose various tools to address this problem is
some indication that others have hit it too. If SQLite, a very widely
deployed C project, has run into this problem then it can't be all that
unusual for code chasing peak performance and minimum memory footprint. I
think this proposal imposes the least possible burden on libc maintainers,
as it can be entirely implemented using the APIs they already have to
expose, and the semantics are carefully defined to match that of just
calling malloc with that size, so the only interaction is with the
free_sized APIs and only to codify the guidance that was already in the
standard about extensions.
I felt it was worth adding to the language because the lack of an ability
to coordinate between the primary allocator and sub-allocators (defined
loosely as anything that manages memory in application or library code) was
resulting in performance problems that were addressable no other way. If
I'm wrong about the benefit, implementations like musl are free to just
ignore it with a trivial implementation, in full compliance with the
standard.
>
> > > If it is adopted, we would probably just have these functions all be
> > > thin wrappers for malloc that report that the size obtaned is exactly
> > > what you requested.
> > >
> > That's a valid implementation and the API is designed to permit it, to
> > avoid forcing implementations to expose details they would rather not
> > (unlike malloc_usable_size).
> >
> > >
> > > Aside from that:
> > >
> > > 1. The "alloc_result_t" approach is particularly ugly and
> > > anti-idiomatic for C. Returning a void * pointer and taking a size_t *
> > > argument for a location to store the amount actually obtained would be
> > > a lot more idiomatic and discourage misuse.
> > >
> > > 2. The proposed free_sized and free_aligned_sized functions seem
> > > completely useless. As they have undefined behavior if the size and
> > > alignment you pass to them are not correct, they offer no advantage
> > > over just calling free, but make things much more error-prone.
> > >
> > These were already added to the standard in C23; the proposal just makes
> > some modifications to them to describe their interactions with the new
> apis.
>
> Oh, I'd forgotten that. I'll take another look at how your proposal
> interacts with the existing requirements.
>
> > > Overall, I would urge the committee to reject this whole proposal.
> > >
> > I'll relay that feedback to the mailing list, thank you. I take it you
> > would also be opposed to standardizing a version of malloc_usable_size?
> > Someone else proposed that in
> > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3869.pdf
>
> I'm not sure. In order for it to be usable, there needs to be some way
> to make it actually safe (against compiler optimizations that assume
> access past offset n after malloc(n) is undefined). Without compilers
> becoming aware that calling this function invalidates any prior
> assumption they had about the size of the object, malloc implementors
> would be burdened with finding a way to track the exact size requested
> so they could return something the compiler would not break. This does
> not make things more efficient by eliminating the caller's burden to
> track the size; it just adds a burden to track the size in a different
> place, and now it will likely be kept in two places, using more
> resources.
>
> In musl's mallocng we wanted to track that for hardening purposes
> anyway, and already had a clean, low-storage-cost way to do it. But
> for other implementations I think it may be quite burdensome.
>
I'm opposed to standardizing a version of that for a bunch of reasons, even
if my proposal is not chosen. The implementation is expensive for tcmalloc
and other implementations that don't track the requested size in inline
metadata; it's a nuisance for sanitizers; the glibc documentation says you
have to call realloc to actually use the extra space; it interacts poorly
with calloc, realloc, etc. I think it's fine for implementations to choose
to expose implementation details (either the size class, or cheap requested
size tracking) via extensions if they choose, but I think it's a bad idea
to require in the standard.
>
> Rich
>
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.