![]() |
|
Message-ID: <limjtao3jpge6hrrjliko4w6p5t7cfpbc3m3etodqgsxayi3hw@gv4eml5icqb5>
Date: Sat, 21 Jun 2025 16:00:10 +0200
From: Alejandro Colomar <alx@...nel.org>
To: libc-alpha@...rceware.org
Cc: bug-gnulib@....org, musl@...ts.openwall.com,
наб <nabijaczleweli@...ijaczleweli.xyz>, Douglas McIlroy <douglas.mcilroy@...tmouth.edu>,
Paul Eggert <eggert@...ucla.edu>, Robert Seacord <rcseacord@...il.com>,
Elliott Hughes <enh@...gle.com>, Bruno Haible <bruno@...sp.org>,
JeanHeyd Meneide <phdofthehouse@...il.com>, Rich Felker <dalias@...c.org>,
Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>, Joseph Myers <josmyers@...hat.com>,
Florian Weimer <fweimer@...hat.com>, Laurent Bercot <ska-dietlibc@...rnet.org>,
Andreas Schwab <schwab@...e.de>, Eric Blake <eblake@...hat.com>,
Vincent Lefevre <vincent@...c17.net>, Mark Harris <mark.hsj@...il.com>,
Collin Funk <collin.funk1@...il.com>, Wilco Dijkstra <Wilco.Dijkstra@....com>,
DJ Delorie <dj@...hat.com>, Cristian Rodríguez <cristian@...riguez.im>,
Siddhesh Poyarekar <siddhesh@...plt.org>, Sam James <sam@...too.org>, Mark Wielaard <mark@...mp.org>,
"Maciej W. Rozycki" <macro@...hat.com>, Martin Uecker <ma.uecker@...il.com>,
Christopher Bazley <chris.bazley.wg14@...il.com>, eskil@...ession.se,
Daniel Krügler <daniel.kruegler@...glemail.com>
Subject: alx-0029r2 - Restore the traditional realloc(3) specification
Hi!
Here's a revision of the proposal, where the main changes are
- Justify why ENOMEM is being proposed.
- Make ENOMEM optional.
after the feedback from Chris and Paul. See the git repository
mentioned in History for the exact diff.
Have a lovely day!
Alex
---
Name
alx-0029r2 - Restore the traditional realloc(3) specification
Principles
- Uphold the character of the language
- Keep the language small and simple
- Facilitate portability
- Avoid ambiguities
- Pay attention to performance
- Codify existing practice to address evident deficiencies.
- Do not prefer any implementation over others
- Ease migration to newer language editions
- Avoid quiet changes
- Enable secure programming
Category
Remove UB.
Author
Alejandro Colomar <alx@...nel.org>
Cc: <bug-gnulib@....org>
Cc: <musl@...ts.openwall.com>
Cc: <libc-alpha@...rceware.org>
Cc: наб <nabijaczleweli@...ijaczleweli.xyz>
Cc: Douglas McIlroy <douglas.mcilroy@...tmouth.edu>
Cc: Paul Eggert <eggert@...ucla.edu>
Cc: Robert Seacord <rcseacord@...il.com>
Cc: Elliott Hughes <enh@...gle.com>
Cc: Bruno Haible <bruno@...sp.org>
Cc: JeanHeyd Meneide <phdofthehouse@...il.com>
Cc: Rich Felker <dalias@...c.org>
Cc: Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>
Cc: Joseph Myers <josmyers@...hat.com>
Cc: Florian Weimer <fweimer@...hat.com>
Cc: Andreas Schwab <schwab@...e.de>
Cc: Thorsten Glaser <tg@...bsd.de>
Cc: Eric Blake <eblake@...hat.com>
Cc: Vincent Lefevre <vincent@...c17.net>
Cc: Mark Harris <mark.hsj@...il.com>
Cc: Collin Funk <collin.funk1@...il.com>
Cc: Wilco Dijkstra <Wilco.Dijkstra@....com>
Cc: DJ Delorie <dj@...hat.com>
Cc: Cristian Rodríguez <cristian@...riguez.im>
Cc: Siddhesh Poyarekar <siddhesh@...plt.org>
Cc: Sam James <sam@...too.org>
Cc: Mark Wielaard <mark@...mp.org>
Cc: "Maciej W. Rozycki" <macro@...hat.com>
Cc: Martin Uecker <ma.uecker@...il.com>
Cc: Christopher Bazley <chris.bazley.wg14@...il.com>
Cc: <eskil@...ession.se>
Cc: Daniel Krügler <daniel.kruegler@...glemail.com>
History
<https://www.alejandro-colomar.es/src/alx/alx/wg14/alx-0029.git/>
r0 (2025-06-17):
- Initial draft.
r1 (2025-06-20):
- Full rewrite after the recent glibc discussion.
r2 (2025-06-21):
- Remove CC. Add CC.
- wfix.
- Drop quote.
- Add a few more principles
- Clarify why ENOMEM is used in this proposal, and make it
optional.
- Mention unavoidable --but exceptional-- leak in code checking
(size != 0).
- Clarify that part of the description of realloc can be
editorially removed after this change.
See also
<https://nabijaczleweli.xyz/content/blogn_t/017-malloc0.html>
<https://sourceware.org/pipermail/libc-alpha/1999-April/000956.html>
<https://inbox.sourceware.org/libc-alpha/20241019014002.3684656-1-siddhesh@sourceware.org/T/#u>
<https://inbox.sourceware.org/libc-alpha/qukfe5yxycbl5v7ooskvqdnm3au3orohbx4babfltegi47iyly@or6dgf7akeqv/T/#u>
<https://github.com/bminor/glibc/commit/7c2b945e1fd64e0a5a4dbd6ae6592a7314dcd4b5>
<https://www.austingroupbugs.net/view.php?id=400>
<https://www.austingroupbugs.net/view.php?id=526>
<https://www.austingroupbugs.net/view.php?id=688>
<https://sourceware.org/bugzilla/show_bug.cgi?id=12547>
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_400.htm>
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n868.htm>
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2438.htm>
<https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf>
<https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/realloc.html>
<https://pubs.opengroup.org/onlinepubs/9699919799.2013edition/functions/realloc.html>
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120744>
Description
The specification of realloc(3) has been problematic since the
very first standards, even before ISO C. The wording has
changed significantly, trying to forcedly permit implementations
to return a null pointer when the requested size is zero. This
originated from the intent of banning zero-sized objects from
the language in C89, but that never worked well in
retrospective, as we can see from the fallout.
None of the specifications have been good, and C23 finally gave
up and made it undefined behavior.
However, this doesn't need to be like that. The traditional
implementation of realloc(3), present in Unix V7, inherited by
the BSDs, and currently available in range of systems, including
musl libc, doesn't have any issues.
Code written for platforms returning a null can be migrated to
platforms returning non-null, without significant issues.
There are two kinds of code that call realloc(p,0). One
hard-codes the 0, and is used as a replacement of free(p). This
code ignores the return value, since it's unimportant. This
code currently produces a leak of 0 bytes plus associated
metadata on platforms such as musl libc, where it returns a
non-null pointer. However, assuming that there are programs
written with the knowledge that they won't ever be run on such
platforms, we should take care of that, and make sure they don't
leak. A way of accomplishing this would be to recommend
implementations to issue a diagnostic when realloc(3) is called
with a hardcoded zero. This is only an informal recommendation
made by this proposal, as this is a matter of QoI, and the
standard shouldn't say anything about it. This would prevent
this class of minor leaks.
Moreover, in glibc, realloc(p,0) may return non-null, in the
case where p is NULL, so code must already take that into
account, and thus code that simply takes realloc(p,0) as a
synonym of free(p) is already leaky, as free(NULL) is a no-op,
but realloc(NULL,0) allocates 0 bytes.
The other kind of code is in algorithms that realloc(3) an
arbitrary size, which might eventually be zero. This gets more
complex.
Here's the code that should be written for AIX or glibc:
errno = 0;
new = realloc(old, size);
if (new == NULL) {
if (errno == ENOMEM)
free(old);
goto fail;
}
...
free(new);
Failing to check for ENOMEM in these platforms before freeing
the old pointer would result in a double-free. If the program
decides to continue using the old pointer instead of freeing it,
it would result in a use-after-free.
In the platforms where realloc(p,0) returns non-null, such as
the BSDs or musl libc, it is simpler to handle it:
new = realloc(old, size);
if (new == NULL) { // errno is ENOMEM
free(old);
goto fail;
}
...
free(new);
Whenever the result is a null pointer, these platforms are
reporting an ENOMEM error, and thus it is superfluous to check
errno there.
Most code is written in this way, even if run on platforms
returning a null pointer. This is because most programmers are
just unaware of this problem.
If the realloc(3) specification were changed to require that
realloc(p,0) returns non-null on success, and that realloc(p,0)
only fails when out-of-memory, and to require that it sets
errno to ENOMEM, then code written for AIX or glibc would
continue working just fine, since the errno check would be
redundant with the null check. Simply, the conditional
(errno == ENOMEM) would always be true when (new == NULL).
Then, there are non-POSIX platforms that don't set ENOMEM. In
those platforms, code might do this:
new = realloc(old, size);
if (new == NULL) {
if (size != 0)
free(old);
goto fail;
}
...
free(new);
That code would continue working with this proposal, except for
a very rare corner case, in which it would leak. In the normal
case, (size != 0) would never be true under (new == NULL),
because a reallocation of 0 bytes would almost always succeed,
and thus not return a null pointer under this proposal.
However, in some cases, the system might not find space even for
the small metadata needed for a 0-byte allocation. In such
case, the (size != 0) conditional would prevent deallocating
'old', and thus cause a memory leak. This case is exceptional
enough that it shouldn't stop us from fixing realloc(3).
Anyway, on an out-of-memory case, the program is likely to
terminate rather soon, so the issue is even less likely to have
an impact on any existing programs.
This proposal makes handling of realloc(3) as straightforward as
one would expect, with only two states: success or error. There
are no in-between states.
The resulting wording in the standard is also much simpler, as
it doesn't need to define so many special cases.
For consistency, all the other allocation functions are updated
to both return a null pointer and set errno to ENOMEM.
Prior art
gnulib
gnulib provides the realloc-posix module, which aims to wrap the
system realloc(3) and reallocarray(3) functions so that they
behave in a POSIX-complying manner.
It previously behaved like glibc. After I reported that it was
non-conforming to POSIX, we discussed the best way forward,
which we agreed was the same direction that this paper is
proposing now for C2y. The implementation was changed in
gnulib.git d884e6fc4a60 (2024-11-04; "realloc-posix: realloc (..., 0) now returns nonnull")
There have been no regression reports since then, as we
expected.
Unix V7
The proposed behavior is the one endorsed by Doug McIlroy, the
author of the original implementation of realloc(3) in Unix V7,
and also present in the BSDs.
Design decisions
This change needs three changes, which can be applied all at
once, or in separate steps.
The first step would make realloc(p,s) be consistent with
free(p) and malloc(s), including when p is a null pointer, when
s is zero, and also when both corner cases happen at the same
time. This change would already turn the implementations where
malloc(0) returns non-null into the end goal we have.
This first step would require changes to (at least) the
following implementations: glibc, Bionic, Windows.
The second step would be to require that malloc(0) returns a
non-null pointer.
The second step would require changes to (at least) the
following implementations: AIX.
The third step would be to require that on error, errno is set
to ENOMEM. This step is optional (see Caveats below).
This proposal has merged all steps into a single proposal.
This proposal also needs to add ENOMEM to the standard, since it
hasn't been standardized yet.
Future directions
This proposal, by specifying realloc(3) as-if by calling
free(3) and malloc(3), makes redundant several mentions of
realloc(3) next to either free(3) or malloc(3) in the standard.
We could remove them in this proposal, or clean up that in a
separate (mostly editorial) proposal. Let's keep it for a
future proposal for now.
Caveats
n?n:1
Code written today should be careful, in case it can run on
older systems that are not fixed to comply with this stricter
specification. Thus, code written today should call realloc(3)
similar to this:
realloc(p, n?n:1);
When all existing implementations are fixed to comply with this
stricter specification, that workaround can be removed.
ENOMEM
If this proposal didn't use ENOMEM, code that is currently
portable to all POSIX systems
errno = 0;
new = realloc(old, size);
if (new == NULL) {
if (errno == ENOMEM)
free(old);
goto fail;
}
...
free(new);
would not be portable to arbitrary C2y platforms.
Since it is currently impossible to write code today that is
portable to arbitrary C17 systems, we could say that this is not
an issue in ISO C: if we proceed with this proposal removing all
mentions to errno:
- New code written for C2y will only need to check for
NULL to detect errors.
- Code written for specific C17 and older platforms
that don't set errno will continue to work for those
specific platforms.
- Code written for POSIX.1-2024 and older platforms
will continue working on POSIX C2y platforms,
assuming that POSIX will continue mandating ENOMEM.
- Code written for POSIX.1-2024 and older will not be
able to be run on non-POSIX C2y platforms, but that
could be expected.
So, the addition of ENOMEM in this proposal achieves the
following goal:
- Code written for POSIX.1-2024 and older platforms
will continue working on arbitrary C2y platforms.
Maybe this is unnecessary. Thus, the following questions.
In any case, ENOMEM is only meant for backwards compatibility,
and code aiming for C2y would only need to check for NULL.
ENOMEM would be redundant for such programs.
Questions to the C Committee
- Does the C Committee want to retain ENOMEM in this proposal?
(If the answer is not, editorially remove all mentions of
ENOMEM before merging the proposal into C2y.)
- Does the C Committee want to accept this proposal, defining
the behavior of the memory management functions to their
traditional implementation?
Proposed wording
Based on N3550.
7.5 Errors <errno.h>
## Add ENOMEM in p2.
7.25.4.1 Memory management functions :: General
@@ p1
...
If the size of the space requested is zero,
-the behavior is implementation-defined:
-either
-a null pointer is returned to indicate the error,
-or
the behavior is as if the size were some nonzero value,
except that the returned pointer shall not be used
to access an object.
7.25.4.2 The aligned_alloc function
@@ Returns, p3
The <b>aligned_alloc</b> function returns
-either
-a null pointer
-or
-a pointer to the allocated space.
+a pointer to the allocated space
+on success.
+If
+the space cannot be allocated,
+a null pointer is returned,
+and the value of the macro <b>ENOMEM</b>
+is stored in <b>errno</b>.
7.25.4.3 The calloc function
@@ Returns, p3
The <b>calloc</b> function returns
-either
a pointer to the allocated space
+on success.
-or a null pointer
-if
+If
the space cannot be allocated
or if the product <tt>nmemb * size</tt>
-would wraparound <b>size_t</b>.
+would wraparound <b>size_t</b>,
+a null pointer is returned,
+and the value of the macro <b>ENOMEM</b>
+is stored in <b>errno</b>.
7.25.4.7 The malloc function
@@ Returns, p3
The <b>malloc</b> function returns
-either
-a null pointer
-or
-a pointer to the allocated space.
+a pointer to the allocated space
+on success.
+If
+the space cannot be allocated,
+a null pointer is returned,
+and the value of the macro <b>ENOMEM</b>
+is stored in <b>errno</b>.
7.25.4.8 The realloc function
@@ Description, p2
The <b>realloc</b> function
deallocates the old object pointed to by <tt>ptr</tt>
+as if by a call to <b>free</b>,
and returns a pointer to a new object
-that has the size specified by <tt>size</tt>.
+that has the size specified by <tt>size</tt>
+as if by a call to <b>malloc</b>.
The contents of the new object
shall be the same as that of the old object prior to deallocation,
up to the lesser of the new and old sizes.
Any bytes in the new object
beyond the size of the old object
have unspecified values.
@@ p3
If <tt>ptr</tt> is a null pointer,
the <b>realloc</b> function behaves
like the <b>malloc</b> function for the specified size.
Otherwise,
if <tt>ptr</tt> does not match a pointer
earlier returned by a memory management function,
or
if the space has been deallocated
by a call to the <b>free</b> or <b>realloc</b> function,
## We can probably remove all of the above, because of the
## behavior now being defined as-if by calls to malloc(3) and
## free(3). But let's do that editorially in a separate change.
-or
-if the size is zero,
## We're defining the behavior.
the behavior is undefined.
If
-memory for the new object is not allocated,
+the space cannot be allocated,
## Editorial; for consistency with the wording of the other functions.
the old object is not deallocated
and its value is unchanged.
@@ Returns, p4
The <b>realloc</b> function returns
a pointer to the new object
(which can have the same value
-as a pointer to the old object),
+as a pointer to the old object)
+on success.
-or
+If
+space cannot be allocated,
a null pointer
+is returned
+and the value of the macro <b>ENOMEM</b>
+is stored in <b>errno</b>.
--
<https://www.alejandro-colomar.es/>
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.