musl - Re: Re: realloci(): A realloc() variant that works in-place

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251105045654.GA1827@brightrain.aerifal.cx>
Date: Tue, 4 Nov 2025 23:56:54 -0500
From: Rich Felker <dalias@...c.org>
To: Demi Marie Obenour <demiobenour@...il.com>
Cc: musl@...ts.openwall.com, The 8472 <the8472.rs@...inite-source.de>,
	Alejandro Colomar <alx@...nel.org>,
	Thiago Macieira <thiago@...ieira.org>,
	Florian Weimer <fw@...eb.enyo.de>, libc-alpha@...rceware.org,
	Arthur O'Dwyer <arthur.j.odwyer@...il.com>,
	Jonathan Wakely <jwakely@...hat.com>
Subject: Re: Re: realloci(): A realloc() variant that works in-place

On Tue, Nov 04, 2025 at 07:37:41PM -0500, Demi Marie Obenour wrote:
> On 11/4/25 16:01, Rich Felker wrote:
> > On Tue, Nov 04, 2025 at 12:51:16AM +0100, The 8472 wrote:
> >> Hello,
> >>
> >> On 03/11/2025 22:28, Rich Felker wrote:
> >>> On Mon, Nov 03, 2025 at 10:36:07AM +0100, Alejandro Colomar wrote:
> >>>> Hi Rich,
> >>>>
> >>>> On Sun, Nov 02, 2025 at 07:28:57PM -0500, Rich Felker wrote:
> >>>>> On Mon, Nov 03, 2025 at 12:58:39AM +0100, Alejandro Colomar wrote:
> >>>>>>> All this will need fine-tuning once implementations exist.
> >>>>>>>
> >>>>>>>> So, why not require the caller to not ask too much?  We could go back to
> >>>>>>>> reporting an error if there's not enough memory.
> >>>>>>>>
> >>>>>>>> Of course, it would still guarantee no errors when shrinking, but
> >>>>>>>> I think we could error out when growing.
> >>>>>>>
> >>>>>>> I'd prefer no errors either way. If there isn't memory to grow the underlying
> >>>>>>> space (a brk() system call returns ENOMEM), then realloci() returns as much as
> >>>>>>> it could get but not more.
> >>>>>>
> >>>>>> The problem is that this is asking the implementation to speculate.
> >>>>>>
> >>>>>> Consider the case that a realloci() implementation knows that the
> >>>>>> requested size fails.  Let's put some arbitrary numbers:
> >>>>>>
> >>>>>> 	old_size = 10000;
> >>>>>> 	requested_size = 30000;
> >>>>>>
> >>>>>> It knows the block can grow to somewhere between 10000 (which it
> >>>>>> currently has) and 30000 (the system reported ENOMEM), but now it has
> >>>>>> the task of allocating as much as it can get.  Should it do a binary
> >>>>>> search of the size?  Try 20000, then if it fails try 15000, etc.?
> >>>>>> That's speculation, and it would make this function too slow.
> >>>>>
> >>>>> I don't see any plausible implementation in which this involved a
> >>>>> binary search. Either you have fixed-size slots in which case you just
> >>>>> look at the size of the slot to see what the max obtainable is, or you
> >>>>> have a dlmalloc-like situation where you check the size of the
> >>>>> adjacent free block (if any) to determine the max obtainable. These
> >>>>> are O(1) operations.
> >>>>
> >>>> I was thinking of mremap(2) without MREMAP_MAYMOVE.
> >>>
> >>> OK, this whole conversation is mixing up unrelated things:
> >>>
> >>> 1. In-place realloc to avoid relatively-expensive memcpy
> >>> 2. In-place realloc to avoid updating pointers
> >>>
> >>> The case where mremap would be used is utterly irrelevant to (1). And
> >>> further, the cost of the mremap operation is so high (syscall
> >>> overhead, page table/TLB synchronization) that any cost of updating
> >>> pointers because the object moved is dwarfed and thereby irrelevant
> >>> too.
> >>>
> >>> So I don't see why anyone should care about this case.
> >>>
> >>> Moreover, I see (2) as entirely misguided. The whole provenance model
> >>> makes it broken to try to rely on pointer values not changing, and no
> >>> code should be trying to do that. A new allocator interface should not
> >>> be pandering to this very fragile, very likely to be broken by
> >>> compiler transformations, utterly backwards practice. Just treat the
> >>> old pointer as invalid and always update like you're supposed to,
> >>> regardless of whether the value is different.
> >>>
> >>> Rich
> >>>
> >>
> >> On the Rust side we have uses for both these scenarios, and more.
> >>
> >> A) A strictly in-place realloc is useful for collections and
> >> arenas that have outstanding borrows (thus cannot move)
> >> but want to try growing in-place before they have to allocate
> >> another chunk.
> > 
> > This "useful" needs to be quantified. Only in very very rare cases
> > will in-place expansion even be possible. The vast majority of the
> > time, you must allocate another discontiguous chunk to meet the above
> > contractual obligation anyway.
> 
> Would it be better to provide an allocation API that returns the amount
> of memory actually allocated?  That would at least allow any padding at
> the end of the allocation to be used instead of being wasted.

No, that was a mistake made long ago with malloc_usable_size and
wrongly equating "amount actually allocated" with "amount we could
enlarge the allocation up to without running into something else".
Equating them is wrong because (1) the compiler will rightly treat
accesses beyond the requested size at allocation time, even if
malloc_usable_size reports more and the allocator implementation lets
you use more, as UB, and (2) allowing the use of "extra space" here
precludes detecting overflows. It really is necessary, if you want to
allow the application to use extra space, to have some interface by
which it's requested. However, I don't think anyone has made a
compelling case yet that doing this is useful enough to be worth the
trouble and possible unforseen bad consequences -- keep in mind the
bad consequences of malloc_usable_size were entirely unseen at the
time it was sloppily introduced.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.