Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <fmb53bqjtrwkt7kfe72mbuezhel5yuvrnfge2p6hh3jknv5acf@ugwdhihbxrxn>
Date: Fri, 20 Jun 2025 23:26:29 +0200
From: Alejandro Colomar <alx@...nel.org>
To: libc-alpha@...rceware.org
Cc: bug-gnulib@....org, musl@...ts.openwall.com, 
	наб <nabijaczleweli@...ijaczleweli.xyz>, Douglas McIlroy <douglas.mcilroy@...tmouth.edu>, 
	Paul Eggert <eggert@...ucla.edu>, Robert Seacord <rcseacord@...il.com>, 
	Elliott Hughes <enh@...gle.com>, Bruno Haible <bruno@...sp.org>, 
	JeanHeyd Meneide <phdofthehouse@...il.com>, Rich Felker <dalias@...c.org>, 
	Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>, Joseph Myers <josmyers@...hat.com>, 
	Florian Weimer <fweimer@...hat.com>, Laurent Bercot <ska-dietlibc@...rnet.org>, 
	Andreas Schwab <schwab@...e.de>, Thorsten Glaser <tg@...bsd.de>, Eric Blake <eblake@...hat.com>, 
	Vincent Lefevre <vincent@...c17.net>, Mark Harris <mark.hsj@...il.com>, 
	Collin Funk <collin.funk1@...il.com>, Wilco Dijkstra <Wilco.Dijkstra@....com>, 
	DJ Delorie <dj@...hat.com>, Cristian Rodríguez <cristian@...riguez.im>, 
	Siddhesh Poyarekar <siddhesh@...plt.org>, Sam James <sam@...too.org>, Mark Wielaard <mark@...mp.org>, 
	"Maciej W. Rozycki" <macro@...hat.com>, Martin Uecker <ma.uecker@...il.com>, 
	Christopher Bazley <chris.bazley.wg14@...il.com>, eskil@...ession.se
Subject: alx-0029r1 - Restore the traditional realloc(3) specification

Hi!

After the useful discussion with Eric and Paul, I've rewritten a draft
of a proposal I had for realloc(3) for C2y.  Here it is (see below).

I'll present it here before presenting it to the C Committee (although
several members are CCd).

This time, I opted for an all-in-one change that puts us in the end
goal, since some people were concerned that step-by-step might be less
feasible.  Also, the wording is more consistent doing this at once, and
people know what to expect from the begining.


Have a lovely day!
Alex

---
Name
	alx-0029r1 - Restore the traditional realloc(3) specification

Principles
	-  Uphold the character of the language
	-  Keep the language small and simple
	-  Facilitate portability
	-  Avoid ambiguities
	-  Pay attention to performance
	-  Codify existing practice to address evident deficiencies.
	-  Avoid quiet changes
	-  Enable secure programming

Category
	Remove UB.

Author
	Alejandro Colomar <alx@...nel.org>

	Cc: <bug-gnulib@....org>
	Cc: <musl@...ts.openwall.com>
	Cc: <libc-alpha@...rceware.org>
	Cc: наб <nabijaczleweli@...ijaczleweli.xyz>
	Cc: Douglas McIlroy <douglas.mcilroy@...tmouth.edu>
	Cc: Paul Eggert <eggert@...ucla.edu>
	Cc: Robert Seacord <rcseacord@...il.com>
	Cc: Elliott Hughes <enh@...gle.com>
	Cc: Bruno Haible <bruno@...sp.org>
	Cc: JeanHeyd Meneide <phdofthehouse@...il.com>
	Cc: Rich Felker <dalias@...c.org>
	Cc: Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>
	Cc: Joseph Myers <josmyers@...hat.com>
	Cc: Florian Weimer <fweimer@...hat.com>
	Cc: Laurent Bercot <ska-dietlibc@...rnet.org>
	Cc: Andreas Schwab <schwab@...e.de>
	Cc: Thorsten Glaser <tg@...bsd.de>
	Cc: Eric Blake <eblake@...hat.com>
	Cc: Vincent Lefevre <vincent@...c17.net>
	Cc: Mark Harris <mark.hsj@...il.com>
	Cc: Collin Funk <collin.funk1@...il.com>
	Cc: Wilco Dijkstra <Wilco.Dijkstra@....com>
	Cc: DJ Delorie <dj@...hat.com>
	Cc: Cristian Rodríguez <cristian@...riguez.im>
	Cc: Siddhesh Poyarekar <siddhesh@...plt.org>
	Cc: Sam James <sam@...too.org>
	Cc: Mark Wielaard <mark@...mp.org>
	Cc: "Maciej W. Rozycki" <macro@...hat.com>
	Cc: Martin Uecker <ma.uecker@...il.com>
	Cc: Christopher Bazley <chris.bazley.wg14@...il.com>
	Cc: <eskil@...ession.se>

History
	<https://www.alejandro-colomar.es/src/alx/alx/wg14/alx-0029.git/>

	r0 (2025-06-17):
	-  Initial draft.

	r1 (2025-06-20):
	-  Full rewrite after the recent glibc discussion.

See also
	<https://nabijaczleweli.xyz/content/blogn_t/017-malloc0.html>
	<https://sourceware.org/pipermail/libc-alpha/1999-April/000956.html>
	<https://inbox.sourceware.org/libc-alpha/20241019014002.3684656-1-siddhesh@sourceware.org/T/#u>
	<https://inbox.sourceware.org/libc-alpha/qukfe5yxycbl5v7ooskvqdnm3au3orohbx4babfltegi47iyly@or6dgf7akeqv/T/#u>
	<https://github.com/bminor/glibc/commit/7c2b945e1fd64e0a5a4dbd6ae6592a7314dcd4b5>
	<https://www.austingroupbugs.net/view.php?id=400>
	<https://www.austingroupbugs.net/view.php?id=526>
	<https://www.austingroupbugs.net/view.php?id=688>
	<https://sourceware.org/bugzilla/show_bug.cgi?id=12547>
	<https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm>
	<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n868.htm>
	<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2438.htm>
	<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf>
	<https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/realloc.html>
	<https://pubs.opengroup.org/onlinepubs/9699919799.2013edition/functions/realloc.html>

Description
	Let's start by quoting the author of realloc(3).

	On 2024-10-18 05:30, Douglas McIlroy wrote:
	> The discussion has taken a turn that's astonishing to one who
	> doesn't know the inside details of real compilers.
	>
	> Regardless of the behavior of malloc(0), one expects this
	> theorem to hold:
	>
	>	Given that p = malloc(n) is not NULL,
	>	that 0<=m<=n,
	>	and that malloc(m) could in some circumstance
	>	return a non-null pointer,
	>	then realloc(p,m) will return a non-null pointer.
	>
	> REALLOC_ZERO_BYTES_FREES flies in the face of this rational
	> expectation about dynamic storage allocation.  A diabolical
	> invention.
	>
	> Doug

	The specification of realloc(3) has been problematic since the
	very first standards, even before ISO C.  The wording has
	changed significantly, trying to forcedly permit implementations
	to return a null pointer when the requested size is zero.  This
	originated from the intent of banning zero-sized objects from
	the language in C89, but that never worked well in
	retrospective, as we can see from the fallout.

	None of the specifications have been good, and C23 finally gave
	up and made it undefined behavior.

	However, this doesn't need to be like that.  The traditional
	implementation of realloc(3), present in Unix V7, inherited by
	the BSDs, and currently available in range of systems, including
	musl libc, doesn't have any issues.

	Code written for platforms returning a null can be migrated to
	platforms returning non-null, without significant issues.

	There are two kinds of code that call realloc(p,0).  One
	hard-codes the 0, and is used as a replacement of free(p).  This
	code ignores the return value, since it's unimportant.  This
	code currently produces a leak of 0 bytes plus associated
	metadata on platforms such as musl libc, where it returns a
	non-null pointer.  However, assuming that there are programs
	written with the knowledge that they won't ever be run on such
	platforms, we should take care of that, and make sure they don't
	leak.  A way of accomplishing this would be to recommend
	implementations to issue a diagnostic when realloc(3) is called
	with a hardcoded zero.  This is only an informal recommendation
	made by this proposal, as this is a matter of QoI, and the
	standard shouldn't say anything about it.  This would prevent
	this class of minor leaks.

	Moreover, in glibc, realloc(p,0) may return non-null, in the
	case where p is NULL, so code must already take that into
	account, and thus code that simply takes realloc(p,0) as a
	synonym of free(p) is already leaky, as free(NULL) is a no-op,
	but realloc(NULL,0) allocates 0 bytes.

	The other kind of code is in algorithms that realloc(3) an
	arbitrary size, which might eventually be zero.  This gets more
	complex.

	Here's the code that should be written for AIX or glibc:

		errno = 0;
		new = realloc(old, size);
		if (new == NULL) {
			if (errno == ENOMEM)
				free(old);
			goto fail;
		}
		...
		free(new);

	Failing to check for ENOMEM in these platforms before freeing
	the old pointer would result in a double-free.  If the program
	decides to continue using the old pointer instead of freeing it,
	it would result in a use-after-free.

	In the platforms where realloc(p,0) returns non-null, such as
	the BSDs or musl libc, it is simpler to handle it:

		new = realloc(old, size);
		if (new == NULL) {  // errno is ENOMEM
			free(old);
			goto fail;
		}
		...
		free(new);

	Whenever the result is a null pointer, these platforms are
	reporting an ENOMEM error, and thus it is superfluous to check
	errno there.

	Most code is written in this way, even if run on platforms
	returning a null pointer.  This is because most programmers are
	just unaware of this problem.

	If the realloc(3) specification was changed to require that
	realloc(p,0) returns non-null on success, and that realloc(p,0)
	only fails when out-of-memory, and to require that it sets
	errno to ENOMEM, then code written for AIX or glibc would
	continue working just fine, since the errno check would be
	redundant with the null check.  Simply, the conditional
	(errno == ENOMEM) would always be true when (new == NULL).

	This makes handling of realloc(3) as straightforward as one
	would expect, with only two states: success or error.

	The resulting wording in the standard is also much simpler, as
	it doesn't need to define so many special cases.

	For consistency, all the other allocation functions are updated
	to both return an .

Prior art
    gnulib
	gnulib provides the realloc-posix module, which aims to wrap the
	system realloc(3) and reallocarray(3) functions so that they
	behave in a POSIX-complying manner.

	It previously behaved like glibc.  After I reported that it was
	non-conforming to POSIX, we discussed the best way forward,
	which we agreed was the same direction that this paper is
	proposing now for C2y.  The implementation was changed in

		gnulib.git d884e6fc4a60 (2024-11-04; "realloc-posix: realloc (..., 0) now returns nonnull")

	There have been no regression reports since then, as we
	expected.

    Unix V7
	The proposed behavior is the one endorsed by Doug McIlroy, the
	author of the original implementation of realloc(3) in Unix V7,
	and also present in the BSDs.

Design decisions
	This change needs three changes, which can be applied both at
	once, or in two separate steps.

	The first step would make realloc(p,s) be consistent with
	free(p) and malloc(s), including when p is a null pointer, when
	s is zero, and also when both corner cases happen at the same
	time.  This change would already turn the implementations where
	malloc(0) returns non-null into the end goal we have.

	The first step would require changes to (at least) the following
	implementations: glibc, Bionic, Windows.

	The second step would be to require that malloc(0) returns a
	non-null pointer.

	The second step would require changes to (at least) the
	following implementations: AIX.

	The third step would be to require that on error, errno is set
	to ENOMEM.

	This proposal has merged all steps into a single proposal.

	This proposal also needs to add ENOMEM to the standard, since it
	hasn't been standardized yet.

Future directions
	This proposal, by specifying realloc(3) as-if by calling
	free(3) and malloc(3), makes it redundant several mentions of
	realloc(3) next to either free(3) or malloc(3) in the standard.
	We could remove them in this proposal, or clean up that in a
	separate (mostly editorial) proposal.  Let's keep it for a
	future proposal for now.

Caveats
	Code written today should be careful, in case it can run on
	older systems that are not fixed to comply with this stricter
	specification.  Thus, code written today should call realloc(3)
	similar to this:

		realloc(p, n?n:1);

	When all existing implementations are fixed to comply with this
	stricter specification, that workaround can be removed.

Proposed wording
	Based on N3550.

    7.5  Errors <errno.h>
	## Add ENOMEM in p2.

    7.25.4.1  Memory management functions :: General
	@@ p1
	...
	 If the size of the space requested is zero,
	-the behavior is implementation-defined:
	-either
	-a null pointer is returned to indicate the error,
	-or
	 the behavior is as if the size were some nonzero value,
	 except that the returned pointer shall not be used
	 to access an object.

    7.25.4.2  The aligned_alloc function
	@@ Returns, p3
	 The <b>aligned_alloc</b> function returns
	-either
	-a null pointer
	-or
	-a pointer to the allocated space.
	+a pointer to the allocated space
	+on success.
	+If
	+the space cannot be allocated,
	+a null pointer is returned,
	+and the value of the macro <b>ENOMEM</b>
	+is stored in <b>errno</b>.

    7.25.4.3  The calloc function
	@@ Returns, p3
	 The <b>calloc</b> function returns
	-either
	 a pointer to the allocated space
	+on success.
	-or a null pointer
	-if
	+If
	 the space cannot be allocated
	 or if the product <tt>nmemb * size</tt>
	-would wraparound <b>size_t</b>.
	+would wraparound <b>size_t</b>,
	+a null pointer is returned,
	+and the value of the macro <b>ENOMEM</b>
	+is stored in <b>errno</b>.

    7.25.4.7  The malloc function
	@@ Returns, p3
	 The <b>malloc</b> function returns
	-either
	-a null pointer
	-or
	-a pointer to the allocated space.
	+a pointer to the allocated space
	+on success.
	+If
	+the space cannot be allocated,
	+a null pointer is returned,
	+and the value of the macro <b>ENOMEM</b>
	+is stored in <b>errno</b>.

    7.25.4.8  The realloc function
	@@ Description, p2
	 The <b>realloc</b> function
	 deallocates the old object pointed to by <tt>ptr</tt>
	+as if by a call to <b>free</b>,
	 and returns a pointer to a new object
	-that has the size specified by <tt>size</tt>.
	+that has the size specified by <tt>size</tt>
	+as if by a call to <b>malloc</b>.
	 The contents of the new object
	 shall be the same as that of the old object prior to deallocation,
	 up to the lesser of the new and old sizes.
	 Any bytes in the new object
	 beyond the size of the old object
	 have unspecified values.

	@@ p3
	 If <tt>ptr</tt> is a null pointer,
	 the <b>realloc</b> function behaves
	 like the <b>malloc</b> function for the specified size.
	 Otherwise,
	 if <tt>ptr</tt> does not match a pointer
	 earlier returned by a memory management function,
	 or
	 if the space has been deallocated
	 by a call to the <b>free</b> or <b>realloc</b> function,
	-or
	-if the size is zero,
	## We're defining the behavior.
	 the behavior is undefined.
	 If
	-memory for the new object is not allocated,
	+the space cannot be allocated,
	## Editorial; for consistency with the wording of the other functions.
	 the old object is not deallocated
	 and its value is unchanged.

	@@ Returns, p4
	 The <b>realloc</b> function returns
	 a pointer to the new object
	 (which can have the same value
	-as a pointer to the old object),
	+as a pointer to the old object)
	+on success.
	-or
	+If
	+space cannot be allocated,
	 a null pointer
	+is returned
	+and the value of the macro <b>ENOMEM</b>
	+is stored in <b>errno</b>.

-- 
<https://www.alejandro-colomar.es/>

Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.