Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <6bbd3099-5841-4ef4-9d16-80640c307411@gmail.com>
Date: Tue, 4 Nov 2025 19:37:41 -0500
From: Demi Marie Obenour <demiobenour@...il.com>
To: musl@...ts.openwall.com, Rich Felker <dalias@...c.org>,
 The 8472 <the8472.rs@...inite-source.de>
Cc: Alejandro Colomar <alx@...nel.org>, Thiago Macieira
 <thiago@...ieira.org>, Florian Weimer <fw@...eb.enyo.de>,
 libc-alpha@...rceware.org, Arthur O'Dwyer <arthur.j.odwyer@...il.com>,
 Jonathan Wakely <jwakely@...hat.com>
Subject: Re: Re: realloci(): A realloc() variant that works in-place

On 11/4/25 16:01, Rich Felker wrote:
> On Tue, Nov 04, 2025 at 12:51:16AM +0100, The 8472 wrote:
>> Hello,
>>
>> On 03/11/2025 22:28, Rich Felker wrote:
>>> On Mon, Nov 03, 2025 at 10:36:07AM +0100, Alejandro Colomar wrote:
>>>> Hi Rich,
>>>>
>>>> On Sun, Nov 02, 2025 at 07:28:57PM -0500, Rich Felker wrote:
>>>>> On Mon, Nov 03, 2025 at 12:58:39AM +0100, Alejandro Colomar wrote:
>>>>>>> All this will need fine-tuning once implementations exist.
>>>>>>>
>>>>>>>> So, why not require the caller to not ask too much?  We could go back to
>>>>>>>> reporting an error if there's not enough memory.
>>>>>>>>
>>>>>>>> Of course, it would still guarantee no errors when shrinking, but
>>>>>>>> I think we could error out when growing.
>>>>>>>
>>>>>>> I'd prefer no errors either way. If there isn't memory to grow the underlying
>>>>>>> space (a brk() system call returns ENOMEM), then realloci() returns as much as
>>>>>>> it could get but not more.
>>>>>>
>>>>>> The problem is that this is asking the implementation to speculate.
>>>>>>
>>>>>> Consider the case that a realloci() implementation knows that the
>>>>>> requested size fails.  Let's put some arbitrary numbers:
>>>>>>
>>>>>> 	old_size = 10000;
>>>>>> 	requested_size = 30000;
>>>>>>
>>>>>> It knows the block can grow to somewhere between 10000 (which it
>>>>>> currently has) and 30000 (the system reported ENOMEM), but now it has
>>>>>> the task of allocating as much as it can get.  Should it do a binary
>>>>>> search of the size?  Try 20000, then if it fails try 15000, etc.?
>>>>>> That's speculation, and it would make this function too slow.
>>>>>
>>>>> I don't see any plausible implementation in which this involved a
>>>>> binary search. Either you have fixed-size slots in which case you just
>>>>> look at the size of the slot to see what the max obtainable is, or you
>>>>> have a dlmalloc-like situation where you check the size of the
>>>>> adjacent free block (if any) to determine the max obtainable. These
>>>>> are O(1) operations.
>>>>
>>>> I was thinking of mremap(2) without MREMAP_MAYMOVE.
>>>
>>> OK, this whole conversation is mixing up unrelated things:
>>>
>>> 1. In-place realloc to avoid relatively-expensive memcpy
>>> 2. In-place realloc to avoid updating pointers
>>>
>>> The case where mremap would be used is utterly irrelevant to (1). And
>>> further, the cost of the mremap operation is so high (syscall
>>> overhead, page table/TLB synchronization) that any cost of updating
>>> pointers because the object moved is dwarfed and thereby irrelevant
>>> too.
>>>
>>> So I don't see why anyone should care about this case.
>>>
>>> Moreover, I see (2) as entirely misguided. The whole provenance model
>>> makes it broken to try to rely on pointer values not changing, and no
>>> code should be trying to do that. A new allocator interface should not
>>> be pandering to this very fragile, very likely to be broken by
>>> compiler transformations, utterly backwards practice. Just treat the
>>> old pointer as invalid and always update like you're supposed to,
>>> regardless of whether the value is different.
>>>
>>> Rich
>>>
>>
>> On the Rust side we have uses for both these scenarios, and more.
>>
>> A) A strictly in-place realloc is useful for collections and
>> arenas that have outstanding borrows (thus cannot move)
>> but want to try growing in-place before they have to allocate
>> another chunk.
> 
> This "useful" needs to be quantified. Only in very very rare cases
> will in-place expansion even be possible. The vast majority of the
> time, you must allocate another discontiguous chunk to meet the above
> contractual obligation anyway.

Would it be better to provide an allocation API that returns the amount
of memory actually allocated?  That would at least allow any padding at
the end of the allocation to be used instead of being wasted.

>> For those a metadata-update or mremap without MAYMOVE is fine.
>>
>> B) Collections that want to resize and can change their pointer
>> but need custom data movement, i.e. not a plain memcpy from the old
>> to the new location. A VecDeque that needs to copy its front and
>> tail to different locations after a resize. A Vec wants to copy
>> fewer bytes than its allocated size.
>>
>> In these cases mremap(MREMAP_MAYMOVE) is fine but memcpy should
>> be avoided and we would fallback to malloc + custom copy operations
>> + free.
> 
> malloc + custom copy + free sounds like it's always the right
> solution. Simply ignore the existence of realloc entirely. It's not
> useful.

For large allocations, one can do this via mremap(MREMAP_MAYMOVE),
which uses page table swizzling rather than a copy.  However, there
might need to be some trickery to avoid TLB shootdowns.  In theory,
the TLB shootdown can be delayed until the virtual address that was
moved from is reused.  I don’t know if Linux implements this.
>> C) Alignment-changing reallocations, for example to go from a Box<[u8]>
>> to Box<[f32]>. In those cases mremap and memcpy are both fine but for
>> large allocations the former would be preferred.
> 
> Why is alignment-changing reallocation a thing? Why would the type of
> an object change like this? Even if it does, all allocations are
> always inherently aligned sufficiently for the alignment requirement
> of any non-over-aligned type.
> 
> Rich


-- 
Sincerely,
Demi Marie Obenour (she/her/hers)
Download attachment "OpenPGP_0xB288B55FFF9C22C1.asc" of type "application/pgp-keys" (7141 bytes)

Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.