musl - Re: Re: BUG: realloc(p,0) should be consistent with malloc(0)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <3igvwk75ofugax53xmbohjp7vbr27pkwivlw6apftaf7haocse@nsszgukhkxzu>
Date: Fri, 20 Jun 2025 11:30:59 -0500
From: Eric Blake <eblake@...hat.com>
To: Alejandro Colomar <alx@...nel.org>
Cc: Rich Felker <dalias@...c.org>, enh <enh@...gle.com>, 
	Florian Weimer <fweimer@...hat.com>, Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>, 
	musl@...ts.openwall.com, libc-alpha@...rceware.org, Joseph Myers <josmyers@...hat.com>, 
	наб <nabijaczleweli@...ijaczleweli.xyz>, Paul Eggert <eggert@...ucla.edu>, 
	Robert Seacord <rcseacord@...il.com>, Bruno Haible <bruno@...sp.org>, bug-gnulib@....org, 
	JeanHeyd Meneide <phdofthehouse@...il.com>, Thorsten Glaser <tg@...bsd.de>
Subject: Re: Re: BUG: realloc(p,0) should be consistent with malloc(0)

On Fri, Jun 20, 2025 at 01:37:58AM +0200, Alejandro Colomar wrote:
> Hey Eric!
> 
> Thanks a lot for the detailed reply!  Comments below.

Ditto.

> > Now, on to some code archeology.  In today's glibc source code, I see
> > this telling comment in malloc/malloc.c, making it clear that glibc
> > folks are aware that realloc(non_null, 0) has two useful behaviors,
> > and that glibc picks the behavior that does NOT behave consistently
> > with malloc(0), because of back-compat guarantees:
> > 
> > /*
> >   The REALLOC_ZERO_BYTES_FREES macro controls the behavior of realloc (p, 0)
> >   when p is nonnull.  If the macro is nonzero, the realloc call returns NULL;
> >   otherwise, the call returns what malloc (0) would.  In either case,
> >   p is freed.  Glibc uses a nonzero REALLOC_ZERO_BYTES_FREES, which
> >   implements common historical practice.
> > 
> >   ISO C17 says the realloc call has implementation-defined behavior,
> >   and it might not even free p.
> > */
> 
> That comment is wrong.  "common historical practice" is that realloc(3)
> is consistent with malloc(3).  This is true since the days of Unix V7.

Careful.  "common historical practice" can just as easily be read
along the lines of "common historical practice of glibc, independent
of other implementations" (since glibc does not inherit a code base
from other implementations).  And "historical" also doesn't give a
time frame; if the comment was only written at a time where glibc
behavior had already been in place for many years, that could be
impetus for using "historical", even if it disregards even more
history pre-glibc from other implementatinos.

> I don't know what they were referring to.  Maybe the behavior introduced
> in SysVr2's -lmalloc which was later standardized in the SVID by AT&T?
> That was never common, since all existing default (-lc) realloc(3)
> implementations behaved as if realloc(p, 1).  You had to use the
> -lmalloc library to get it to return NULL and free the object.

At the time I wrote my mail, I had not researched when glibc made the
change.  Paul's additional research proving that glibc 1.0 behaves
differently than glibc 2.2, and that glibc 2.2 changed behavior
because of a wording change in the C99 draft (whether or not that
change was strictly necessary, and whether or not the actual C99
required what glibc claimed it required by making the behavior
change), throws more fuel onto the fire.

Checking the provenance of that comment in glibc, I've further
determined:

malloc/malloc.c was rewritten in Jan 2002 by Wolfram Gloger (committed
by Ulrich) to borrow ideas from Doug Lea's malloc-2.7.0.c, in commit
fa8d436c (glibc-2.3).
 - REALLOC_ZERO_BYTES_FREE was defined 1 at that time, with this comment:
/*
  REALLOC_ZERO_BYTES_FREES should be set if a call to
  realloc with zero bytes should be the same as a call to free.
  This is required by the C standard. Otherwise, since this malloc
  returns a unique pointer for malloc(0), so does realloc(p, 0).
*/
 - given the timing, C99 would have been the current C standard

Before that rewrite, the feature switch macro was merely defined
(rather than set to 1) as of commit 7c2b945e (April 1999, glibc
2.1.1), the commit by Andreas that Paul already mentioned, and
directly in response to C99 drafting efforts (remember, C99 was not
formally adopted until May 2000); I am less certain without more
research whether the wording that Andreas was referring to at the time
the feature knob was switched made it into the approved C99 unchanged,
or if that was still undergoing debates in the C committee.  The
comment at that time read:
/*
  REALLOC_ZERO_BYTES_FREES should be set if a call to
  realloc with zero bytes should be the same as a call to free.
  Some people think it should. Otherwise, since this malloc
  returns a unique pointer for malloc(0), so does realloc(p, 0).
*/

and Worlfram Gloger reworded the comment again in 431c33c0 (May 1999,
also glibc 2.1.1), as part of synchronizing with ptmalloc:
/*
  REALLOC_ZERO_BYTES_FREES should be set if a call to realloc with
  zero bytes should be the same as a call to free.  The C standard
  requires this. Otherwise, since this malloc returns a unique pointer
  for malloc(0), so does realloc(p, 0).
*/

So, the switch from "Some people think it should" to "The C standard
requires this" was independent from the change from actually flipping
the knob, under different authors, but both changes happened in close
proximity, and the list archives from that time are relevant to the
thinking as to why it was assumed C99 required the change.

Going even further in time, the knob itself (although disabled)
appears to have been around since at least commit f65fd747 (Dec 1996,
importing glibc history into version control, glibc 2.0.4; glibc.git
lacks older history), where it was disabled with documentation of:
  REALLOC_ZERO_BYTES_FREES (default: NOT defined)
     Define this if you think that realloc(p, 0) should be equivalent
     to free(p). Otherwise, since malloc returns a unique pointer for
     malloc(0), so does realloc(p, 0).

so long before C99 was even close to final, glibc was aware of other
implementations having that behavior enough to offer it as a
compile-time knob even if not using it back then.  ChangeLog.1 at that
time (now living in ChangeLog.old/ChangeLog.1) mentions the existence
of malloc/malloc.c in a change by Roland McGrath in April 1992 (that's
the oldest mention of the file within accessible glibc sources, but no
actual view of what the file looked like at the time).  The first
mention of REALLOC_ZERO in ChangeLog.old/ChangeLog.* is in
ChangeLog.10, corresponding to changes in 2000 (and already covered
above in this email).  I lack the resources to see how glibc evolved
before the commit history currently tracked in git, which would be
needed if we want to learn how glibc 1.0 behaved, or when that comment
may have first been added.


Going in the other direction, Paul modified the comment, in commits
dff9e592 and 9f1bed18 (both in April 2021, glibc-2.34):
 - first attempt
/*
  REALLOC_ZERO_BYTES_FREES controls the behavior of realloc (p, 0)
  when p is nonnull.  If nonzero, realloc (p, 0) should free p and
  return NULL.  Otherwise, realloc (p, 0) should do the equivalent
  of freeing p and returning what malloc (0) would return.

  ISO C17 says the behavior is implementation-defined here; glibc
  follows historical practice and defines it to be nonzero.
*/
 - current wording
 /*
  The REALLOC_ZERO_BYTES_FREES macro controls the behavior of realloc (p, 0)
  when p is nonnull.  If the macro is nonzero, the realloc call returns NULL;
  otherwise, the call returns what malloc (0) would.  In either case,
  p is freed.  Glibc uses a nonzero REALLOC_ZERO_BYTES_FREES, which
  implements common historical practice.

  ISO C17 says the realloc call has implementation-defined behavior,
  and it might not even free p.
*/

In that light, 2021 is far enough removed from 2002 that Paul's use of
"historical" could easily be read in terms of glibc history
(independent of other implementations).  And this corresponds to the
point in time after the Austin Group had pointed out to the C
committee that C99 wording and POSIX 2008 were possibly at odds, such
that POSIX was trying to get the C wording improved before C17 was out
so that future POSIX (now POSIX 2024) would be easier.

> 
> See <https://nabijaczleweli.xyz/content/blogn_t/017-malloc0.html>.

Thank you for starting this.  It will probably need some revisions
before it is ready for the C committee, but hopefully this thread helps.

> 
> > More reading: https://www.austingroupbugs.net/view.php?id=400 shows
> > where earlier POSIX missed that C90 to C99 changed what was permitted
> > (and apparantly in a way to render glibc's implementation
> > non-conforming), and that's part of what drove the POSIX folks to ask
> > the C standard to improve the wording.  POSIX 2024 is based on C17,
> > but Nick Stoughton was regularly communicating between both C and
> > POSIX groups on what wording(s) were being floated around, in order to
> > try and make it so that glibc would not have to change behavior, but
> > at the same time trying to make it possible for applications to be
> > able to make wise runtime decisions on how to use realloc that would
> > not leak memory or risk dereferencing a NULL pointer if not careful.
> > 
> > https://sourceware.org/bugzilla/show_bug.cgi?id=12547 shows where
> > glibc has, in the past, refused to change behavior on the grounds that
> > the standards were buggy.  If the standards are still buggy, the best
> > course of action is to open a bug against them.
> 
> All standards since C89 have been buggy.  If you are pedantic reading
> C89, the BSDs and all the historic implementations back to the original
> Unix V7 are non-conforming:
> 
> <https://port70.net/~nsz/c/c89/c89-draft.html#4.10.3.4>
> 
> Which says:
> 
> | If size is zero and ptr is not a null pointer, the object it points to
> | is freed.
> 
> It's not clear whether this means that the whole action of realloc(p,0)
> is to free(3) the pointer, or if it can also allocate a new object.
> Under the former interpretation, the standard is at odds with reality.
> Under the latter interpretation, I'd interpret it as saying that
> realloc(p,0) cannot fail (and thus must free(p)), which would be an
> interesting guarantee.  I guess we'll never know what was the intended
> reading.

C89 also says (4.10.3):
"If the size of the space requested is zero, the behavior is
implementation-defined; the value returned shall be either a null
pointer or a unique pointer."

Putting those two sentences together, I can make a very strong case
that an implementation where "realloc(p,0)" frees p, and then returns
NULL, and documents that it does so, complies (the old object pointed
to by p is free, and the size being zero means that the new object
being NULL rather than a distinct pointer to non-dereferenceable
storage is what the implementation documented); and this is true
whether or not malloc(0) and realloc(p,0) differ on whether they
return NULL, as long as both of them document their behavior on zero
size.

Another observation on C89 - it has different wording in most places
about "if the space has been deallocated by a call to the free or
realloc function"; where free() documents that "The free function
causes the space pointed to by ptr to be deallocated, that is, made
available for further allocation."  The phrase "is deallocated" is
thus precisely defined.  However, the only use of the phrase "is
freed" in that document is the one sentence you quoted about realloc
with non-null pointer and zero size.  I can argue that as an
undocumented term, "is freed" is NOT intended to be synonymous with
"is free()d", and is instead distinct in meaning from "is deallocated"
(ie. "is deallocated" is how a pointer can be reused by a future
malloc, and is no longer a distinct memory location; but "is freed"
could be defined as contents are no longer referenceable but the
pointer might still be allocated as a distinct location in memory and
still safe to pass to a later "free()").  But then, you might ask, why
does the standard talk about space that "has been deallocated by a
call to the free or realloc function" if "realloc(p,0)" is not
deallocating?  My answer: there is another case where it is obvious
that realloc does deallocation: namely, when realloc(p,non-zero)
returns a new pointer that was (presumably larger) than the contents
of the old p - the old value of p "was deallocated" and can now be
reused by a future malloc.  With that definition in hand, I can now
argue that whether realloc(p,0) returns p (truncated and freed of its
contents, but p is still allocated), or returns a new non-NULL pointer
(p was freed of its contents AND deallocated in order to return the
new pointer), an implementation where realloc(p,0) returns a non-NULL
pointer has successfully freed p.  Strenuous logic, perhaps, but we're
already in the weeds.

So, I think we can make arguments that ALL of the following can be
considered compliant under C89 rules (although the argument is easier
for some cases than others):

1. malloc(0) returns non-NULL, realloc(0,0) returns non-NULL,
realloc(p,0) returns p [free(p) is still safe, but it is no longer
safe to access contents of p]

2. malloc(0) returns non-NULL, realloc(0,0) returns non-NULL,
realloc(p,0) returns non-NULL other than p [free(p) is unsafe]

3. malloc(0) returns NULL [0-sized objects are unsupported; presumably
errno=EINVAL but C89 is silent on that], realloc(0,0) returns NULL,
realloc(p,0) deallocates p and then returns NULL [presumably with
errno set, at any rate free(p) is unsafe]

4. malloc(0) returns NULL, realloc(0,0) returns NULL, realloc(p,0)
returns p unchanged [free(p) is still safe, but dereferencing its
contents is no longer safe]

5. malloc(0) returns non-NULL, realloc(0,0) returns non-NULL,
realloc(p,0) deallocates p and returns NULL [free(p) is unsafe]

Of those, it looks like glibc 2.1 would be case 1 (the same pointer is
returned, but truncated down to minimum size), glibc 2.2 to present
would be case 5 (the pointer is freed, the function returns NULL even
though it inconsistent with malloc(0) being able to return zero-sized
objects), and other traditional implementations could be either case 2
(a non-NULL pointer is returned because zero-sized objects are always
possible, but because it was distinct from p it also met the rule
about having freed p) or case 1 (the call "freed" the contents of p,
but did not deallocate it).

> 
> C99 changed the specification, probably because of how ambiguous it was.
> 
> glibc was also buggy, as it differed from every other Unix-like system.
> All Unix systems behaved as if free(p) and malloc(n).  glibc is the only
> one that didn't follow this obvious consistency rule.

Maybe the intended wording was that "if realloc(p,s) returns a
non-NULL value distinct from p, then p was deallocated and the new
value obtained as if by malloc(n)".  After all, the reason realloc()
exists is for the cases where realloc(p,s) can return p (ie. resized
in place, whether by truncating and optionally handing back an unused
tail to the system, or by expanding where the new tail was already
accessible in place even though it has unspecified contents); it's
only when the resize-in-place can't happen that realloc() must then
arrange to copy contents from the old pointer to the new.  In fact,
even though portable code must not expect specific contents in the new
pointer if the sequence free(p);malloc(s) happens to reuse p, I could
totally see how some (possibly-older) implementations of malloc() have
sufficient locking in place where it may be easier to try and resize
pointer p by free(p)malloc(s) and only if the resize changed locations
then do the copying - as long as the rest of the application can't
corrupt the contents in the old location before the new location is
finally returned to the user.  But even if it was the intended wording
(or if that is the wording that we hope to have in place in the
future), unfortunately it is not the actual wording.


> 
> So, both are bogus.
> 
> > Also relevant are these documents
> >         https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2438.htm
> >         https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf
> > 
> > > > > (like i said, i care greatly about actual shipping code. a standard is
> > > > > interesting for green-field stuff, but when it's at odds with reality
> > > > > it's often worse to try to adapt than just ignore the stupidity/report
> > > > > the bug and get it changed back.)
> > > 
> > > It's ironic that the standard should have never said that, because prior
> > > to the existence of ANSI C and POSIX, all existing systems behaved like
> > > the current POSIX specification.  It was a consequence of the horrible
> > > wording of the standards, that glibc was written so badly, by following
> > > a bogus specification, when it should have been made compatible with the
> > > existing systems.
> > 
> > POSIX was originally released in 1988, before C90.  glibc 1.0 came out
> > in 1992.  I am not sure when glibc first cared about whether trending
> > towards POSIX compliance mattered, although I do know that in the
> > early days, Ulrich would very adamently argue along the lines of
> > (paraphrased) "if the standards don't match common sense, then we
> > don't care about the standards".
> 
> It seems Ulrich didn't follow that in this case.  I don't know who wrote
> the original realloc(3) in glibc.  Was it RMS?  It would be interesting
> to know how they came up with that implementation.  If anyone knows who
> wrote it and why, please CC them.
> 
> I don't have a copy of POSIX.1-1988, nor of any other POSIX.1 before
> POSIX.1-2001.  What do they say for realloc(3)?
> 
> > > Thus, this is a historical bug in ISO C, POSIX, which at least has been
> > > finally fixed in POSIX.
> > 
> > The fact that the wording has changed across multiple versions of C
> > and POSIX is indeed evidence that getting a specification that people
> > are happy with is difficult.  What is harder is the decision of
> > whether the bug is in the standard (for not documenting reality) or in
> > the implementations (for not doing what the standard rightfully
> > requests), or even both.  And "what people are happy with" differs on
> > who you ask - wording that permits disparate libc behavior is nicer to
> > the libraries (they don't have to change) but meaner to application
> > writers (the construct is not portable, so it is safer to avoid the
> > construct altogether rather than worry about which libraries have
> > which behaviors); whereas wording that locks down behavior is nicer to
> > applications (if I write this, it should work regardless of platform,
> > and if it doesn't, the standard exists as leverage to get libc fixed)
> > but meaner to libraries (forcing the library to version its symbols to
> > change behavior for newer standards while still providing ABI
> > stability guarantees to older apps that depend on the old behavior is
> > not cheap).
> 
> As can be seen from the change in gnulib, the only possible issues from
> migrating from the current glibc behavior to the musl behavior is a few
> leaks in cases where the programmer calls realloc(p,0) ignoring the
> return value.  Those leaks would leak 0 bytes plus the metadata.
> 
> A solution for those leaks would be to add a diagnostic for calls to
> realloc(3) where the return value is unused.  And even if those aren't
> fully solved, they're leaks of a few bytes.  There's nothing that should
> cause real issues.
> 
> But the glibc maintainers mentioned that they're investigating about it
> in distros, so I guess we'll eventually have the results of their
> investigation.
> 
> > The sad fact of the matter is that _because_ it there are so many
> > differences in opinions, the C23 action of making realloc(p,0)
> > undefined is probably the simplest course that could be agreed on
> > (don't ever do that in your code, because you can't guarantee the
> > results), but simultaneously annoying to end users (because it is
> > undefined, rather than implementation-defined or unspecified, a
> > compiler can "optimize" your code to do WHATEVER IT WANTS - which
> > really means you CANNOT ever reliably call realloc(p,0) if your
> > compiler is aiming for C23).
> 
> Indeed.  I think the move from C17 to C23 was good.
> 
> The issue with C17 is that it is very similar to POSIX.1-2008, but since
> ISO C doesn't require that errno is set when the pointer is not freed,
> it is impossible to portably determine if the input pointer was freed
> after realloc(p,0).  This is not an issue in POSIX.1, though, since it
> can and does require that errno is set if the input pointer is not
> freed.
> 
> Because it was impossible to determine whether r(p,0) has freed p after
> returning NULL in C17, it was effectively UB.  So, I consider C23 to be
> a minor change from C17, and one which clarifies that it is UB, because
> it already was before.
> 
> POSIX.1 is not limited by this limitation of ISO C.
> 
> > On top of that, the POSIX standard usually defers to a (fixed version)
> > of C, but does have the liberty to impose well-defined behavior even
> > where the corresponding C standard left things undefined (for example,
> > POSIX 2017 was able to demand that a POSIX system can cast function
> > pointers to and from void* in order to implement dlsym(), even though
> > C99 said that was undefined).  Put another way, just because C23 has
> > changed realloc(p,0) to be undefined does NOT require a future version
> > of POSIX to do likewise when it finally moves to a newer C than C17.
> > But at the same time, POSIX is unlikely to make things strict if it
> > risks alienating existing implementations; if glibc changes behavior,
> > that would go a long way towards POSIX changing wording to be
> > stricter.
> 
> Indeed; I think POSIX.1 doesn't need to make this undefined, and
> shouldn't.
> 
> > > BTW, the same text is present in POSIX.1-2017.  It was changed in a TC,
> > > following bug <https://www.austingroupbugs.net/view.php?id=400>.

That one mentions WG14 document N872 paragraph 19.c as the change from
C89 to C99; and it matches what I see in the final C99 document in
7.20.3.4:

19c. realloc rewrite (Meyers)
  The *realloc* function deallocates the old object pointed to
  by *ptr* and returns a pointer to a new object that has the
  size specified by *size*.  The contents of the new object
  shall be the same as the old object before deallocation up
  to the lesser of the size of the old object and *size*.  Any
  bytes in the new object beyond the size of the old object
  have indeterminate values.

  If *ptr* is a null pointer, the *realloc* function behaves
  like the *malloc* function for the specified size.
  Otherwise, if *ptr* does not match a pointer earlier
  returned by the *calloc*, *malloc*, or *realloc* function,
  or if the space has been deallocated by a call to the *free*
  or *realloc* function, the behavior is undefined.  If memory
  for the new object cannot be allocated, the old object is
  not deallocated and its value is unchanged.

  Returns

  The *realloc* function returns a pointer to the new object,
  which may have the same value as *ptr*, or a null pointer if
  the new object could not be allocated.

C99 also has the disclaimer in 7.20.3, changed slightly in wording
from the disclaimer in C89, that: "If the size of the space requested
is zero, the behavior is implementation-defined: either a null pointer
is returned, or the behavior is as if the size were some nonzero
value, except that the returned pointer shall not be used to access an
object."

> > > 
> > > The motivation, from what I can read there, seems to be that C99 already
> > > made POSIX.1 non-conforming, and this fix was intended to conform to
> > > C99.

Or worded differently in light of the above: C89 had a strong
requirement about p "is freed" after realloc(p,0), which was confusing
in itself, so C99 tried to change the wording to get rid of the
undefined phrase, and make it clear that the old object is deallocated
and the new object (even if it has the same value of p as the old
object) is allocated and has initial contents that match the old
object (whether the implementation can optimize by resizing in place
without copying is not observable from the caller's perspective, and
the C99 mandating that realloc() always produces a new object on
success, even if the new pointer is the same as the old, makes
lifetime analysis elsewhere in the standard easier).

In fact, I'm almost inclined to say that it was C89's wording (and not
C99's) that was the reason that glibc flipped their default to having
realloc(p,0) return NULL (because it was C89's wording that made it
into POSIX); and it was C99's debate on newer wording that brought the
issue to light.  And yet, here we are, STILL trying to get better
wording into both C and POSIX.

> > > 
> > > Indeed, glibc is non-conforming to C99 too.  Although, I don't like the
> > > wording from C99, either; it allows weird stuff: it allows an
> > > implementation where malloc(0) returns NULL and realloc(p,0) non-null
> > > (so, the opposite of glibc).
> > > 
> > > C11 is essentially identical to C99 in that regard, so glibc is also
> > > non-conforming to C11.

Here, I'm inclined to argue the opposite: glibc IS compliant with C99
and C11, and the commit history in glibc shows that the change to have
realloc(p,0) return NULL was made at the time of C99 on the grounds of
a compliance argument, even if it might have been misguided.  It was
C89, not C99, that explicitly required p to be freed; and C99 was
clarifying that the old object is deallocated before the new object
(if any) is returned, even if the pointer is the same.  And again, it
stems back to the fact that C says it is implementation-defined
whether a size of 0 returns NULL or a distinct pointer, and has no
requirements on errno being set.  Presumably, as long as you are
willing to set errno=0, call realloc(p,0), and then on NULL check if
errno==EINVAL (p is still valid) or still unset (p was freed), then
glibc's implementation complies, even though it does not match
historical behavior of either the implemenations where malloc(0)
always fails (a zero-sized object is not possible) nor the
implementations where realloc(p,0) always returns non-NULL.  POSIX
then tried to add the rules to be able to distinguish between NULL
meaning success and being an EINVAL error (since C99 didn't).

I'm out of time today to reply to anything later in your email.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.