musl - Re: alx-0029r3 - Restore the traditional realloc(3) specification

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <lqwifu3d5rcevgbzskyxgqlxceznlougry6tbtote5xbt37u7s@kynjo4ncdocj>
Date: Tue, 24 Jun 2025 16:18:40 -0500
From: Eric Blake <eblake@...hat.com>
To: Alejandro Colomar <alx@...nel.org>
Cc: libc-alpha@...rceware.org, bug-gnulib@....org, musl@...ts.openwall.com, 
	наб <nabijaczleweli@...ijaczleweli.xyz>, Douglas McIlroy <douglas.mcilroy@...tmouth.edu>, 
	Paul Eggert <eggert@...ucla.edu>, Robert Seacord <rcseacord@...il.com>, 
	Elliott Hughes <enh@...gle.com>, Bruno Haible <bruno@...sp.org>, 
	JeanHeyd Meneide <phdofthehouse@...il.com>, Rich Felker <dalias@...c.org>, 
	Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>, Joseph Myers <josmyers@...hat.com>, 
	Florian Weimer <fweimer@...hat.com>, Laurent Bercot <ska-dietlibc@...rnet.org>, 
	Andreas Schwab <schwab@...e.de>, Vincent Lefevre <vincent@...c17.net>, 
	Mark Harris <mark.hsj@...il.com>, Collin Funk <collin.funk1@...il.com>, 
	Wilco Dijkstra <Wilco.Dijkstra@....com>, DJ Delorie <dj@...hat.com>, 
	Cristian Rodríguez <cristian@...riguez.im>, Siddhesh Poyarekar <siddhesh@...plt.org>, 
	Sam James <sam@...too.org>, Mark Wielaard <mark@...mp.org>, 
	"Maciej W. Rozycki" <macro@...hat.com>, Martin Uecker <ma.uecker@...il.com>, 
	Christopher Bazley <chris.bazley.wg14@...il.com>, eskil@...ession.se, 
	Daniel Krügler <daniel.kruegler@...glemail.com>, Kees Cook <keescook@...omium.org>, 
	Valdis Klētnieks <valdis.kletnieks@...edu>
Subject: Re: alx-0029r3 - Restore the traditional realloc(3) specification

On Tue, Jun 24, 2025 at 07:01:50AM +0200, Alejandro Colomar wrote:
> Hi!
> 
> Here's a new revision of the proposal.  I've removed ENOMEM, since it's
> not strictly necessary; it's only necessary that those systems that
> already set it continue setting it (and my proposal for POSIX will
> certainly include ENOMEM).
> 
> I've also added links to real bugs caused by this issue, as some people
> seem to be blind to those.  There have been RCE vulnerabilities caused
> by people having to work around the brain damage of realloc(3) returning
> NULL on success.
> 
> Below goes the proposal.

Some feedback to consider:
...

> 
> Description
> 	The specification of realloc(3) has been problematic since the
> 	very first standards, even before ISO C.  The wording has
> 	changed significantly, trying to forcedly permit implementations
> 	to return a null pointer when the requested size is zero.  This
> 	originated from the intent of banning zero-sized objects from
> 	the language in C89, but that never worked well in
> 	retrospective, as we can see from the fallout.
> 
> 	None of the specifications have been good, and C23 finally gave
> 	up and made it undefined behavior.
> 
> 	The problem is not only theoretical.  Programmers don't know how
> 	to use realloc(3) correctly, and have written weird code in
> 	their attempts.  This has resulted in a lot of non-sensical code
> 	in configure scripts[1], and even bugs in actual programs[2].
> 
> 	[1] <https://codesearch.debian.net/search?q=%5Cbrealloc%5B+%5Ct%5D*%5B%28%5D%5B%5E%2C%5D*%2C%5B+%5Ct%5D0%5B%29%5D&literal=0>
> 	[2] <https://lore.kernel.org/lkml/20220213182443.4037039-1-keescook@chromium.org/>
> 
> 	In some cases, this non-sensical code has resulted in RCEs[3].
> 
> 	[3] <https://gbhackers.com/whatsapp-double-free-vulnerability/>
> 
> 	However, this doesn't need to be like that.  The traditional
> 	implementation of realloc(3), present in Unix V7, inherited by
> 	the BSDs, and currently available in range of systems, including
> 	musl libc, doesn't have any issues.

It may be worth specifically mentioning that glibc 2.1 and earlier
also had that behavior, even though it was an independent
implementation not derived from Unix V7; and that the _only_ reason
glibc 2.2 and later changed in 2000 to just freeing p instead of
returning a result like malloc(0) was because someone argued that the
C89 wording required that change, despite the new wording in C99 in
discussion under the time that was trying to remove the warts in the
C89 definition.

> 
> 	Code written for platforms returning a null can be migrated to
> 	platforms returning non-null, without significant issues.
> 
> 	There are two kinds of code that call realloc(p,0).  One
> 	hard-codes the 0, and is used as a replacement of free(p).  This
> 	code ignores the return value, since it's unimportant.  This
> 	code currently produces a leak of 0 bytes plus associated
> 	metadata on platforms such as musl libc, where it returns a
> 	non-null pointer.  However, assuming that there are programs
> 	written with the knowledge that they won't ever be run on such
> 	platforms, we should take care of that, and make sure they don't
> 	leak.  A way of accomplishing this would be to recommend
> 	implementations to issue a diagnostic when realloc(3) is called
> 	with a hardcoded zero.  This is only an informal recommendation
> 	made by this proposal, as this is a matter of QoI, and the
> 	standard shouldn't say anything about it.  This would prevent
> 	this class of minor leaks.
> 
> 	Moreover, in glibc, realloc(p,0) may return non-null, in the
> 	case where p is NULL, so code must already take that into
> 	account, and thus code that simply takes realloc(p,0) as a
> 	synonym of free(p) is already leaky, as free(NULL) is a no-op,
> 	but realloc(NULL,0) allocates 0 bytes.
> 
> 	The other kind of code is in algorithms that realloc(3) an
> 	arbitrary size, which might eventually be zero.  This gets more
> 	complex.
> 
> 	Here's the code that should be written for AIX or glibc:
> 
> 		errno = 0;
> 		new = realloc(old, size);
> 		if (new == NULL) {
> 			if (errno == ENOMEM)
> 				free(old);
> 			goto fail;
> 		}
> 		...
> 		free(new);
> 
> 	Failing to check for ENOMEM in these platforms before freeing
> 	the old pointer would result in a double-free.  If the program
> 	decides to continue using the old pointer instead of freeing it,
> 	it would result in a use-after-free.
> 
> 	In the platforms where realloc(p,0) returns non-null, such as
> 	the BSDs or musl libc, it is simpler to handle it:
> 
> 		new = realloc(old, size);
> 		if (new == NULL) {  // errno is ENOMEM
> 			free(old);
> 			goto fail;
> 		}
> 		...
> 		free(new);
> 
> 	Whenever the result is a null pointer, these platforms are
> 	reporting an ENOMEM error, and thus it is superfluous to check
> 	errno there.
> 
> 	Most code is written in this way, even if run on platforms
> 	returning a null pointer.  This is because most programmers are
> 	just unaware of this problem.

It may also be worth pointing out that any time code behaves one way
for 3, 2, 1, and then suddenly changes behavior at 0, it is much
harder to code the use of that interface correctly; when compared to
an interface where each successive call is merely one byte less in
effect than the previous-larger call.

> 
> 	If the realloc(3) specification were changed to require that
> 	realloc(p,0) returns non-null on success, and that realloc(p,0)
> 	only fails when out-of-memory (and assuming the implementations
> 	will continue setting errno to ENOMEM), then code written for
> 	AIX or glibc would continue working just fine, since the errno
> 	check would be redundant with the null check.  Simply, the
> 	conditional (errno == ENOMEM) would always be true when
> 	(new == NULL).
> 
> 	Then, there are non-POSIX platforms that don't set ENOMEM.  In
> 	those platforms, code might do this:
> 
> 		new = realloc(old, size);
> 		if (new == NULL) {
> 			if (size != 0)
> 				free(old);
> 			goto fail;
> 		}
> 		...
> 		free(new);
> 
> 	That code would continue working with this proposal, except for
> 	a very rare corner case, in which it would leak.  In the normal
> 	case, (size != 0) would never be true under (new == NULL),
> 	because a reallocation of 0 bytes would almost always succeed,
> 	and thus not return a null pointer under this proposal.
> 	However, in some cases, the system might not find space even for
> 	the small metadata needed for a 0-byte allocation.  In such
> 	case, the (size != 0) conditional would prevent deallocating
> 	'old', and thus cause a memory leak.  This case is exceptional
> 	enough that it shouldn't stop us from fixing realloc(3).
> 	Anyway, on an out-of-memory case, the program is likely to
> 	terminate rather soon, so the issue is even less likely to have
> 	an impact on any existing programs.  Also, LLVM's address
> 	sanitizer will soon able to catch such a leak:
> 	<https://github.com/llvm/llvm-project/issues/113065>
> 
> 	This proposal makes handling of realloc(3) as straightforward as
> 	one would expect, with only two states: success or error.  There
> 	are no in-between states.
> 
> 	The resulting wording in the standard is also much simpler, as
> 	it doesn't need to define so many special cases.
> 
> 	For consistency, all the other allocation functions are updated
> 	to both return a null pointer on error, and use consistent
> 	wording.
> 
> Prior art
>     gnulib
> 	gnulib provides the realloc-posix module, which aims to wrap the
> 	system realloc(3) and reallocarray(3) functions so that they
> 	behave in a POSIX-complying manner.
> 
> 	It previously behaved like glibc.  After I reported that it was
> 	non-conforming to POSIX, we discussed the best way forward,
> 	which we agreed was the same direction that this paper is
> 	proposing now for C2y.  The implementation was changed in
> 
> 		gnulib.git d884e6fc4a60 (2024-11-04; "realloc-posix: realloc (..., 0) now returns nonnull")
> 
> 	There have been no regression reports since then, as we
> 	expected.
> 
>     Unix V7
> 	The proposed behavior is the one endorsed by Doug McIlroy, the
> 	author of the original implementation of realloc(3) in Unix V7,
> 	and also present in the BSDs.

Would calling out glibc 2.1 as prior art help?

> 
> Design decisions
> 	This change needs three changes, which can be applied all at
> 	once, or in separate steps.

You document "three" here...

> 
> 	The first step would make realloc(p,s) be consistent with
> 	free(p) and malloc(s), including when p is a null pointer, when
> 	s is zero, and also when both corner cases happen at the same
> 	time.  This change would already turn the implementations where
> 	malloc(0) returns non-null into the end goal we have.
> 
> 	This first step would require changes to (at least) the
> 	following implementations: glibc, Bionic, Windows.

...then mention the first step twice...

> 
> 	The second step would be to require that malloc(0) returns a
> 	non-null pointer.
> 
> 	The second step would require changes to (at least) the
> 	following implementations: AIX.

...and the second step twice...

> 
> 	This proposal has merged all steps into a single proposal.

...and no mention of the third step.  You'll want to clean that up.

> 
> Future directions
> 	This proposal, by specifying realloc(3) as-if by calling
> 	free(3) and malloc(3), makes redundant several mentions of
> 	realloc(3) next to either free(3) or malloc(3) in the standard.
> 	We could remove them in this proposal, or clean up that in a
> 	separate (mostly editorial) proposal.  Let's keep it for a
> 	future proposal for now.
> 
> Caveats
>     n?n:1
> 	Code written today should be careful, in case it can run on
> 	older systems that are not fixed to comply with this stricter
> 	specification.  Thus, code written today should call realloc(3)
> 	similar to this:
> 
> 		realloc(p, n?n:1);
> 
> 	When all existing implementations are fixed to comply with this
> 	stricter specification, that workaround can be removed.
> 
>     ENOMEM
> 	Existing implementations that set errno to ENOMEM must continue
> 	doing so when the input pointer is not freed.  If they didn't,
> 	code that is currently portable to all POSIX systems
> 
> 		errno = 0;
> 		new = realloc(old, size);
> 		if (new == NULL) {
> 			if (errno == ENOMEM)
> 				free(old);
> 			goto fail;
> 		}
> 		...
> 		free(new);
> 
> 	would leak on error.

Would it also be worth mentioning (either here or as a footnote to be
added in the standard) that while atypical, realloc() is allowed to
fail with ENOMEM even when the new size is smaller than the previous
size of the pointer?  This might be seen as a non-intuitive result,
but as Rich Felker pointed out, there ARE implementations of malloc()
that use alignment properties on the returned pointer itself as part
of the information encoding how much memory the region points to (such
as whether the allocation comes from mmap or the heap, for example),
and the standard should not be precluding these types of
implementations.  With such a mention in place, it may also be worth
mentioning that when new==NULL, it is not necessary to call free(old)
immediately, if the programmer would rather ignore the fact that the
system cannot move the allocation to a more efficient location and
that the tail of the old pointer is now wasted space.

> 
> 	Since it is currently impossible to write code today that is
> 	portable to arbitrary C17 systems, this is not an issue in
> 	ISO C.
> 
> 		-  New code written for C2y will only need to check for
> 		   NULL to detect errors.
> 
> 		-  Code written for specific C17 and older platforms
> 		   that don't set errno will continue to work for those
> 		   specific platforms.
> 
> 		-  Code written for POSIX.1-2024 and older platforms
> 		   will continue working on POSIX C2y platforms,
> 		   assuming that POSIX will continue mandating ENOMEM.
> 
> 		-  Code written for POSIX.1-2024 and older will not be
> 		   able to be run on non-POSIX C2y platforms, but that
> 		   could be expected.
> 
> 	The only important thing is that platforms that did set ENOMEM
> 	should continue setting it, to avoid introducing leaks.
> 
> Proposed wording
> 	Based on N3550.
> 
>     7.25.4.1  Memory management functions :: General
> 	@@ p1
> 	...
> 	 If the size of the space requested is zero,
> 	-the behavior is implementation-defined:
> 	-either
> 	-a null pointer is returned to indicate the error,
> 	-or
> 	 the behavior is as if the size were some nonzero value,
> 	 except that the returned pointer shall not be used
> 	 to access an object.
> 
>     7.25.4.2  The aligned_alloc function
> 	@@ Returns, p3
> 	 The <b>aligned_alloc</b> function returns
> 	-either
> 	-a null pointer
> 	-or
> 	-a pointer to the allocated space.
> 	+a pointer to the allocated space
> 	+on success.
> 	+If
> 	+the space cannot be allocated,
> 	+a null pointer is returned.
> 
>     7.25.4.3  The calloc function
> 	@@ Returns, p3
> 	 The <b>calloc</b> function returns
> 	-either
> 	 a pointer to the allocated space
> 	+on success.
> 	-or a null pointer
> 	-if
> 	+If
> 	 the space cannot be allocated
> 	 or if the product <tt>nmemb * size</tt>
> 	-would wraparound <b>size_t</b>.
> 	+would wraparound <b>size_t</b>,
> 	+a null pointer is returned.

Not part of this paper, but would it make sense for implementations
that return different errno for the two different classes of failures
here?  ENOMEM when the pointer can't be allocated, and EINVAL (or
maybe ERANGE or EOVERFLOW) when nmemb*size overflows?

If C2y standardizes reallocarray(), then this becomes important.
Without sane errno values, it is impossible to tell whether
reallocarray() failed due to the inability to allocate the new
pointer, or whether it failed because the parameters overflowed and no
reallocation could be attempted.  In fact, glibc documents that
attempting to malloc(PTRDIFF_MAX+1 bytes is treated as a failure, even
if that value is positive and less than the total amount of memory
available in the system, because objects cannot be so large as to
cause problems when computing pointer differences.  But as long as the
interface is documented as leaving the old pointer unchanged on ANY
failures (whether allocation or overflow), then the semantics are
still easy to work with even when not having errno to rely on.

> 
>     7.25.4.7  The malloc function
> 	@@ Returns, p3
> 	 The <b>malloc</b> function returns
> 	-either
> 	-a null pointer
> 	-or
> 	-a pointer to the allocated space.
> 	+a pointer to the allocated space
> 	+on success.
> 	+If
> 	+the space cannot be allocated,
> 	+a null pointer is returned.
> 
>     7.25.4.8  The realloc function
> 	@@ Description, p2
> 	 The <b>realloc</b> function
> 	 deallocates the old object pointed to by <tt>ptr</tt>
> 	+as if by a call to <b>free</b>,
> 	 and returns a pointer to a new object
> 	-that has the size specified by <tt>size</tt>.
> 	+that has the size specified by <tt>size</tt>
> 	+as if by a call to <b>malloc</b>.
> 	 The contents of the new object
> 	 shall be the same as that of the old object prior to deallocation,
> 	 up to the lesser of the new and old sizes.
> 	 Any bytes in the new object
> 	 beyond the size of the old object
> 	 have unspecified values.
> 
> 	@@ p3
> 	 If <tt>ptr</tt> is a null pointer,
> 	 the <b>realloc</b> function behaves
> 	 like the <b>malloc</b> function for the specified size.
> 	 Otherwise,
> 	 if <tt>ptr</tt> does not match a pointer
> 	 earlier returned by a memory management function,
> 	 or
> 	 if the space has been deallocated
> 	 by a call to the <b>free</b> or <b>realloc</b> function,
> 	## We can probably remove all of the above, because of the
> 	## behavior now being defined as-if by calls to malloc(3) and
> 	## free(3).  But let's do that editorially in a separate change.
> 	-or
> 	-if the size is zero,
> 	## We're defining the behavior.
> 	 the behavior is undefined.
> 	 If
> 	-memory for the new object is not allocated,
> 	+the space cannot be allocated,
> 	## Editorial; for consistency with the wording of the other functions.
> 	 the old object is not deallocated
> 	 and its value is unchanged.
> 
> 	@@ Returns, p4
> 	 The <b>realloc</b> function returns
> 	 a pointer to the new object
> 	 (which can have the same value
> 	-as a pointer to the old object),
> 	+as a pointer to the old object)
> 	+on success.
> 	-or
> 	+If
> 	+space cannot be allocated,
> 	 a null pointer
> 	-if the new object has not been allocated.
> 	+is returned.

I'm liking the direction this proposal is headed in.  If it is down to
a question of whether glibc or the C standard will blink first, I'm
hoping that we can get some agreement from both sides, rather than
being stuck in a stalemate with each arguing that the other is the
reason to not fix things.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.