![]() |
|
Message-ID: <7c4wgpgrgtguz7ss7wgikbssgqgwnwkgah43gv77zxiauyvrsi@3vusaxk4wxu6>
Date: Fri, 20 Jun 2025 01:45:13 +0200
From: Alejandro Colomar <alx@...nel.org>
To: Eric Blake <eblake@...hat.com>
Cc: Rich Felker <dalias@...c.org>, enh <enh@...gle.com>,
Florian Weimer <fweimer@...hat.com>, Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>,
musl@...ts.openwall.com, libc-alpha@...rceware.org, Joseph Myers <josmyers@...hat.com>,
наб <nabijaczleweli@...ijaczleweli.xyz>, Paul Eggert <eggert@...ucla.edu>,
Robert Seacord <rcseacord@...il.com>, Bruno Haible <bruno@...sp.org>, bug-gnulib@....org,
JeanHeyd Meneide <phdofthehouse@...il.com>, Thorsten Glaser <tg@...bsd.de>
Subject: Re: Re: BUG: realloc(p,0) should be consistent with malloc(0)
On Fri, Jun 20, 2025 at 01:38:14AM +0200, Alejandro Colomar wrote:
> Hey Eric!
>
> Thanks a lot for the detailed reply! Comments below.
>
> On Thu, Jun 19, 2025 at 01:31:06PM -0500, Eric Blake wrote:
> > On Wed, Jun 18, 2025 at 09:04:02PM +0200, Alejandro Colomar wrote:
> > > Hi Rich, Elliott,
> > >
> > > On Wed, Jun 18, 2025 at 12:35:50PM -0400, Rich Felker wrote:
> > > > On Wed, Jun 18, 2025 at 11:20:54AM -0400, enh wrote:
> > > > > On Tue, Jun 17, 2025 at 5:58 PM Alejandro Colomar <alx@...nel.org> wrote:
> > > > > >
> > > > > > Hi Elliott, Florian,
> > > > > >
> > > > > > glibc and Bionic are non-conforming to POSIX.1-2024. The fix that we're
> > > > > > proposing would make them conforming. Does conformance to POSIX.1-2024
> > > > > > mean something to you?
> > > > >
> > > > > not when POSIX screwed up and made a change that made most of the
> > > > > existing implementations non-conformant, no. that sounds like a POSIX
> > > > > bug to me...
> > >
> > > Not most. Only two POSIX implementations, plus Windows. And the
> > > solution is easy: fix the implementations. There have been no
> > > regression reports in gnulib since we fixed it last year.
> >
> > Speaking as someone who participated in the POSIX standardization
> > process, I'm trying to pinpoint exactly which statements of which
> > versions of which standards you are claiming as nonconformance.
> >
> > First, a disclaimer: because this thread has been very vocal, I
> > brought the topic up in today's Austin Group meeting. The members of
> > the group on the phone call remember _specifically_ trying to permit
> > existing glibc behavior (where realloc(p, 0) does NOT allocate), while
> > still jugging competing wording from the C standards, although we will
> > be the first to admit that we would not be surprised if the resulting
> > efforts are still not clear enough to be unambiguous. I mentioned in
> > the meeting that I would attempt to follow up on these threads to see
> > what, if anything, the Austin Group may need to do to assist in the
> > discussion.
>
> Thanks! I'm in Denver for the Open Source Summit and LSS. If you'll
> be around next week, we can have a chat in person, which might be more
> useful. I'd like to have a long conversation about this.
>
> > Next, my overarching question. Is this about "realloc(non_null, 0)",
> > "realloc(NULL, 0)", or both? As the two are very distinct, I want to
> > make sure we are talking about the same usage patterns. For the rest
> > of this email, I'm assuming that your complaints are solely about
> > "realloc(non_null, 0)" - if I'm wrong, it may change the analysis done
> > below.
>
> Yup, it's about realloc(non_null, 0). r(NULL,0) is fine.
>
> > Now, on to some code archeology. In today's glibc source code, I see
> > this telling comment in malloc/malloc.c, making it clear that glibc
> > folks are aware that realloc(non_null, 0) has two useful behaviors,
> > and that glibc picks the behavior that does NOT behave consistently
> > with malloc(0), because of back-compat guarantees:
> >
> > /*
> > The REALLOC_ZERO_BYTES_FREES macro controls the behavior of realloc (p, 0)
> > when p is nonnull. If the macro is nonzero, the realloc call returns NULL;
> > otherwise, the call returns what malloc (0) would. In either case,
> > p is freed. Glibc uses a nonzero REALLOC_ZERO_BYTES_FREES, which
> > implements common historical practice.
> >
> > ISO C17 says the realloc call has implementation-defined behavior,
> > and it might not even free p.
> > */
>
> That comment is wrong. "common historical practice" is that realloc(3)
> is consistent with malloc(3). This is true since the days of Unix V7.
> I don't know what they were referring to. Maybe the behavior introduced
> in SysVr2's -lmalloc which was later standardized in the SVID by AT&T?
> That was never common, since all existing default (-lc) realloc(3)
> implementations behaved as if realloc(p, 1). You had to use the
> -lmalloc library to get it to return NULL and free the object.
>
> See <https://nabijaczleweli.xyz/content/blogn_t/017-malloc0.html>.
>
> > More reading: https://www.austingroupbugs.net/view.php?id=400 shows
> > where earlier POSIX missed that C90 to C99 changed what was permitted
> > (and apparantly in a way to render glibc's implementation
> > non-conforming), and that's part of what drove the POSIX folks to ask
> > the C standard to improve the wording. POSIX 2024 is based on C17,
> > but Nick Stoughton was regularly communicating between both C and
> > POSIX groups on what wording(s) were being floated around, in order to
> > try and make it so that glibc would not have to change behavior, but
> > at the same time trying to make it possible for applications to be
> > able to make wise runtime decisions on how to use realloc that would
> > not leak memory or risk dereferencing a NULL pointer if not careful.
> >
> > https://sourceware.org/bugzilla/show_bug.cgi?id=12547 shows where
> > glibc has, in the past, refused to change behavior on the grounds that
> > the standards were buggy. If the standards are still buggy, the best
> > course of action is to open a bug against them.
>
> All standards since C89 have been buggy. If you are pedantic reading
> C89, the BSDs and all the historic implementations back to the original
> Unix V7 are non-conforming:
>
> <https://port70.net/~nsz/c/c89/c89-draft.html#4.10.3.4>
>
> Which says:
>
> | If size is zero and ptr is not a null pointer, the object it points to
> | is freed.
>
> It's not clear whether this means that the whole action of realloc(p,0)
> is to free(3) the pointer, or if it can also allocate a new object.
> Under the former interpretation, the standard is at odds with reality.
> Under the latter interpretation, I'd interpret it as saying that
> realloc(p,0) cannot fail (and thus must free(p)), which would be an
> interesting guarantee. I guess we'll never know what was the intended
> reading.
>
> C99 changed the specification, probably because of how ambiguous it was.
>
> glibc was also buggy, as it differed from every other Unix-like system.
> All Unix systems behaved as if free(p) and malloc(n). glibc is the only
> one that didn't follow this obvious consistency rule.
>
> So, both are bogus.
>
> > Also relevant are these documents
> > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2438.htm
> > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf
> >
> > > > > (like i said, i care greatly about actual shipping code. a standard is
> > > > > interesting for green-field stuff, but when it's at odds with reality
> > > > > it's often worse to try to adapt than just ignore the stupidity/report
> > > > > the bug and get it changed back.)
> > >
> > > It's ironic that the standard should have never said that, because prior
> > > to the existence of ANSI C and POSIX, all existing systems behaved like
> > > the current POSIX specification. It was a consequence of the horrible
> > > wording of the standards, that glibc was written so badly, by following
> > > a bogus specification, when it should have been made compatible with the
> > > existing systems.
> >
> > POSIX was originally released in 1988, before C90. glibc 1.0 came out
> > in 1992. I am not sure when glibc first cared about whether trending
> > towards POSIX compliance mattered, although I do know that in the
> > early days, Ulrich would very adamently argue along the lines of
> > (paraphrased) "if the standards don't match common sense, then we
> > don't care about the standards".
>
> It seems Ulrich didn't follow that in this case. I don't know who wrote
> the original realloc(3) in glibc. Was it RMS? It would be interesting
> to know how they came up with that implementation. If anyone knows who
> wrote it and why, please CC them.
>
> I don't have a copy of POSIX.1-1988, nor of any other POSIX.1 before
> POSIX.1-2001. What do they say for realloc(3)?
>
> > > Thus, this is a historical bug in ISO C, POSIX, which at least has been
> > > finally fixed in POSIX.
> >
> > The fact that the wording has changed across multiple versions of C
> > and POSIX is indeed evidence that getting a specification that people
> > are happy with is difficult. What is harder is the decision of
> > whether the bug is in the standard (for not documenting reality) or in
> > the implementations (for not doing what the standard rightfully
> > requests), or even both. And "what people are happy with" differs on
> > who you ask - wording that permits disparate libc behavior is nicer to
> > the libraries (they don't have to change) but meaner to application
> > writers (the construct is not portable, so it is safer to avoid the
> > construct altogether rather than worry about which libraries have
> > which behaviors); whereas wording that locks down behavior is nicer to
> > applications (if I write this, it should work regardless of platform,
> > and if it doesn't, the standard exists as leverage to get libc fixed)
> > but meaner to libraries (forcing the library to version its symbols to
> > change behavior for newer standards while still providing ABI
> > stability guarantees to older apps that depend on the old behavior is
> > not cheap).
>
> As can be seen from the change in gnulib, the only possible issues from
> migrating from the current glibc behavior to the musl behavior is a few
> leaks in cases where the programmer calls realloc(p,0) ignoring the
> return value. Those leaks would leak 0 bytes plus the metadata.
>
> A solution for those leaks would be to add a diagnostic for calls to
> realloc(3) where the return value is unused. And even if those aren't
> fully solved, they're leaks of a few bytes. There's nothing that should
> cause real issues.
>
> But the glibc maintainers mentioned that they're investigating about it
> in distros, so I guess we'll eventually have the results of their
> investigation.
>
> > The sad fact of the matter is that _because_ it there are so many
> > differences in opinions, the C23 action of making realloc(p,0)
> > undefined is probably the simplest course that could be agreed on
> > (don't ever do that in your code, because you can't guarantee the
> > results), but simultaneously annoying to end users (because it is
> > undefined, rather than implementation-defined or unspecified, a
> > compiler can "optimize" your code to do WHATEVER IT WANTS - which
> > really means you CANNOT ever reliably call realloc(p,0) if your
> > compiler is aiming for C23).
>
> Indeed. I think the move from C17 to C23 was good.
>
> The issue with C17 is that it is very similar to POSIX.1-2008, but since
> ISO C doesn't require that errno is set when the pointer is not freed,
> it is impossible to portably determine if the input pointer was freed
> after realloc(p,0). This is not an issue in POSIX.1, though, since it
> can and does require that errno is set if the input pointer is not
> freed.
Self correction: POSIX.1-2008 .. POSIX.1-2024 does allow setting errno
and freeing the input pointer, as Paul Eggert reminded. AIX does this.
This is brain damaged, and makes it also impossible to portably
determine whether the pointer was freed after realloc(p,0).
Thus, declaring it UB in POSIX.1 would also be an improvement.
>
> Because it was impossible to determine whether r(p,0) has freed p after
> returning NULL in C17, it was effectively UB. So, I consider C23 to be
> a minor change from C17, and one which clarifies that it is UB, because
> it already was before.
>
> POSIX.1 is not limited by this limitation of ISO C.
>
> > On top of that, the POSIX standard usually defers to a (fixed version)
> > of C, but does have the liberty to impose well-defined behavior even
> > where the corresponding C standard left things undefined (for example,
> > POSIX 2017 was able to demand that a POSIX system can cast function
> > pointers to and from void* in order to implement dlsym(), even though
> > C99 said that was undefined). Put another way, just because C23 has
> > changed realloc(p,0) to be undefined does NOT require a future version
> > of POSIX to do likewise when it finally moves to a newer C than C17.
> > But at the same time, POSIX is unlikely to make things strict if it
> > risks alienating existing implementations; if glibc changes behavior,
> > that would go a long way towards POSIX changing wording to be
> > stricter.
>
> Indeed; I think POSIX.1 doesn't need to make this undefined, and
> shouldn't.
>
> > > BTW, the same text is present in POSIX.1-2017. It was changed in a TC,
> > > following bug <https://www.austingroupbugs.net/view.php?id=400>.
> > >
> > > The motivation, from what I can read there, seems to be that C99 already
> > > made POSIX.1 non-conforming, and this fix was intended to conform to
> > > C99.
> > >
> > > Indeed, glibc is non-conforming to C99 too. Although, I don't like the
> > > wording from C99, either; it allows weird stuff: it allows an
> > > implementation where malloc(0) returns NULL and realloc(p,0) non-null
> > > (so, the opposite of glibc).
> > >
> > > C11 is essentially identical to C99 in that regard, so glibc is also
> > > non-conforming to C11.
> > >
> > > C17 changed to something very weird. It seems to me that glibc is
> > > conforming again to C17, but it also seems to me that it's impossible to
> > > write code that uses realloc(p,0) in a portable way with this
> > > specification. I think it's a good thing that C23 removed that crap.
> > >
> > >
> > > Here's a summary of conformance to standards:
> > >
> > > glibc conforms to:
> > > - SysVr4
> > > - ISO C89
> >
> > I don't (currently) have a copy of C89 handy in front of me to quote
> > chapter and verse for this one, beyond what was already quoted in
> > Austin Group bug 400.
>
> Here's the draft, in various formats:
>
> <https://port70.net/~nsz/c/c89/>
>
> > > - ISO C17
> >
> > This one I _can_ quote. 7.22.3.4:
> >
> > "If size is zero and memory for the new object is not allocated, it is
> > implementation-defined whether the old object is deallocated."
> >
> > glibc documents that "realloc(non_null, 0)" deallocates non_null.
> > Therefore it is compliant. But that wording is still unfriendly to
> > users - there is no way to programmatically query the runtime what
> > behavior the implementation defined.
>
> > > - XPG4
> > >
> > > glibc doesn't conform to:
> > > - SysIII
> > > - SysV
> > > - SysVr2
> > > - SysVr3
> > > - SVID Issue 2
> > > - SVID Issue 3
> > > - The X/Open System V Specification
> > > - ISO C99
> >
> > Here, section 7.20.3.4 is relevant. In there, I see wording "If ptr is
> > a null pointer, the realloc function behaves like the malloc function
> > for the specified size." but NO wording about when ptr is non-null but
> > size is 0. As best I can tell, silence on the part of C99 means that
> > the standard is unspecified, and therefore glibc can do whatever it
> > wants and still claim compliance. But I'm open to correction if you
> > can quote the exact statement for why you claim glibc is non-compliant
> > here.
>
> <https://port70.net/~nsz/c/c99/n1256.html#7.20.3.4>
>
> I'll quote the entire text:
>
> Description
>
> 2 The realloc function
> deallocates the old object pointed to by ptr and
> returns a pointer to a new object that
> has the size specified by size.
> [...] // talks about the contents
>
> 3 If ptr is a null pointer, [...].
> Otherwise,
> if ptr does not match a pointer earlier returned by
> the calloc, malloc, [...], the behavior is undefined.
> If memory for the new object cannot be allocated,
> the old object is not deallocated and its value is unchanged.
>
> Returns
>
> 4 The realloc function returns a pointer to the new object
> (which may have the same value as a pointer to the old object),
> or a null pointer if the new object could not be allocated.
>
> IMO, paragraph 4 rules out the possibility of returning a null pointer
> on success.
>
> Also, while it doesn't specify what happens in the case of size 0
> explicitly, it mentions in paragraph 2 what happens for all sizes:
> it returns a pointer to a new object that has the size specified by
> size --which in this case is 0 bytes--.
>
> This wording of C99 was relatively good, and fixes the problems from
> C89 which had turned all historical implementations into
> non-conformance. C99 seems to restore the common historical behavior of
> realloc(3), turning glibc non-conforming as a consequence.
>
> > > - ISO C11
> >
> > This wording appears to match C99.
>
> Agree.
>
> > > - POSIX.1-2001
> >
> > This one defers to C89 anywhere that it is not explicitly documenting
> > with CX shading.
>
> Ahh, I had thought it would defer to C99 because it's older, but I guess
> it's like POSIX.1-2024 that doesn't defer to C23. Thanks! Then I stand
> corrected, and glibc conforms to POSIX.1-2001.
>
> > It adds CX shading to document the use of
> > errno=ENOMEM on allocation failures, but otherwise omits shading when
> > it states:
> >
> > "If size is 0 and ptr is not a null pointer, the object pointed to is
> > freed."
> >
> > which sounds like glibc behavior. But without double-checking C89, it
> > is hard to say whether POSIX accidentally diverged from C89 in
> > allowing glibc as compliant.
> >
> > > - POSIX.1-2008
> >
> > This version of POSIX defers to C99, but still states "If size is 0
> > and ptr is not a null pointer, the object pointed to is freed."
>
> I don't have a copy of POSIX.1-2008, but I assume the text is identical
> to POSIX.1-2001, except that it now defers to C99. Since C99 rules out
> the possibility of returning a null pointer on success (7.20.3.4p4),
> and POSIX.1-2008 doesn't seem to have shaded text to extend it, it is
> bound by the C99 restriction. The allowances provided by POSIX.1-2008
> are invalidated as unintentional.
>
> > without CX shading, even though C99 does NOT have the same wording as
> > C89. You could argue that this statement should be ignored since it
> > lacks CX shading and does not match any statement in C99.
>
> Indeed.
>
> > But even
> > so, unless you can demonstrate chapter-and-verse how glibc fails to
> > comply with C99, you also have a hard time convincing me that glibc
> > does not comply with POSIX 2008. And this issue was why Austin Group
> > bug 400 was created.
>
> I alreayd mentioned it above, but I'll copy for completeness:
>
> n1256::7.20.3.4p4.
>
> <https://port70.net/~nsz/c/c99/n1256.html#7.20.3.4p4>
>
> The realloc function returns a pointer to the new object
> (which may have the same value as a pointer to the old object),
> or a null pointer if the new object could not be allocated.
>
> This seems to preclude the possibility of returning NULL on success.
>
> Also, this sentence is complemented by n1256::7.20.3.4p3, last sentence:
>
> If memory for the new object cannot be allocated,
> the old object is not deallocated and its value is unchanged.
>
> This sentence rules that if the implementation could consider that their
> returning a null pointer is because they decide that they can't
> allocate 0 bytes (this would be a valid interpretation), then they are
> forced to leave the pointer not deallocated. glibc frees the object,
> and thus it is not complying with this, and we must consider that glibc
> has succeeded in the allocation, which brings us back to p4.
>
> > No mention of POSIX.1-2013?
>
> I didn't have a copy of that. Thanks! I'll add it to the list of
> non-conforming standards.
>
> > But just in case you're keeping track,
> > that is the version where Bug 400 was applied, and the text changed
> > to: "If the size of the space requested is zero, the behavior shall be
> > implementation-defined: either a null pointer is returned, or the
> > behavior shall be as if the size were some non-zero value, except that
> > the returned pointer shall not be used to access an object." But it
> > also has the problem that it requires "If size is 0, either: A null
> > pointer shall be returned <CX>and errno set to an
> > implementation-defined value</CX>. ..."
> >
> > which glibc does NOT comply with. realloc(non_null,0) returns NULL
> > _without_ setting errno, precisely because it DID free the object
> > successfully. This requirement in POSIX 2013 is an explicit extension
> > not mentioned in C99, AND it was quickly pointed out that it forbids
> > glibc behavior, so:
> >
> > > - POSIX.1-2017
> >
> > This one additionally applies Bug 526 and 688, to try and clean up
> > wording differences from C99, in particular clarifying whether errno
> > has to be set when "realloc(non_null, 0)" frees a pointer:
> >
> > https://www.austingroupbugs.net/view.php?id=526
> > https://www.austingroupbugs.net/view.php?id=688
> >
> > where the wording is once again relaxed to "If the size of the space
> > requested is zero, the behavior shall be implementation-defined:
> > either a null pointer is returned, or the behavior shall be as if the
> > size were some non-zero value, except that the behavior is undefined
> > if the returned pointer is used to access an object. ... If size is 0,
> > either: A null pointer shall be returned <CX>and, if ptr is not a null
> > pointer, errno shall be set to an implementation-defined value</CX>."
> >
> > which should once again allow glibc to be deemed compliant. At the
> > same time, the Austin Group was trying to get C17 fixed; that fix
> > turned out to be ugly, so the C committed tried again in C23.
>
> The last sentence clearly states that if size is 0 and ptr is non-null
> and a null pointer is returned, then *errno shall be set*. glibc
> doesn't set errno, and thus does not conform. Can you please clarify
> how you consider glibc's behavior to comply with that last sentence from
> your quote? For completeness, the sentence I'm talking about is
>
> If size is 0,
> either: A null pointer shall be returned <CX>and, if ptr is not a null
> pointer, errno shall be set to an implementation-defined value</CX>."
>
> > > - POSIX.1-2024
> >
> > Here, the standard defers to C17 rather than C99, but adds a lot more
> > CX shading. Given the changes between C99 and C17, POSIX tried to
> > match. Unfortunately, the DESCRIPTION section lost any mention of
> > non_null pointer plus 0 size, leaving only the RETURN VALUE secion,
> > which now uses the wording entirely in CX shading:
> >
> > "If size is 0, or either nelem or elsize is 0, either: • A null
> > pointer shall be returned and, if ptr is not a null pointer, errno
> > shall be set to [EINVAL]. • A pointer to the allocated space shall be
> > returned, and the memory object pointed to by ptr shall be freed. The
> > application shall ensure that the pointer is not used to access an
> > object."
> >
> > Despite the efforts of the Austin Group to not break back-compat, that
> > one clearly looks like glibc is not compliant (glibc returns NULL and
> > does NOT set errno to EINVAL). And if I recall the conversations, we
> > knew at the time of POSIX 2024 that C23 would be marking
> > realloc(non_null, 0) as undefined behavior, and wanted to capture that
> > directly rather than depending on C17, but we may have failed in our
> > efforts.
>
> TBH, I think POSIX.1-2024 has a decent specification. I'd prefer the
> one from C99 and C11, but it is decent.
>
> > > Conformance to POSIX.1-2001 and POSIX.1-2008 is not clear. While glibc
> > > conforms to the wording of these standards, these standards have the
> > > following header in the realloc(3) specification:
> > >
> > > The functionality described on this reference page is aligned
> > > with the ISO C standard. Any conflict between the requirements
> > > described here and the ISO C standard is unintentional. This
> > > volume of IEEE Std 1003.1-2001 defers to the ISO C standard.
> > >
> > > Which means that POSIX's permissive wording is unintentional, and the
> > > ISO C99 wording is the one that matters, so glibc is non-conforming.
> >
> > The conflicts are unintentional only when <CX> shading is not
> > explicitly present.
>
> And POSIX.1-2001 .. POSIX.1-2008 doesn't have any CX shading.
>
> > > (I didn't mention C23, since it's UB, so anything conforms.)
> > >
> > >
> > > Have a lovely day!
> > > Alex
> > >
> > > --
> > > <https://www.alejandro-colomar.es/>
> >
> > If you've managed to make it this far, congratulations. We probably
> > still need to open bugs against POSIX to have POSIX-2024-TC1 improve
> > any ambiguous wording, and taking into account whatever the C
> > committee may decide to do with Alejandro's proposals for post-C23
> > behaviors, and whether glibc is willing to make realloc(non_null, 0)
> > allocate in the same manner as malloc(0) rather than being a hidden
> > call to free().
>
> As always, I have trouble with using the Austin group interface. If
> you're in Denver, this is another thing you could help me with. :)
>
> > I don't know if I answered all of your questions, or raised even more,
> > but you have your work cut out for you before declaring the man pages
> > good enough.
>
> Again, thanks a lot!!
>
>
> Have a lovely day!
> Alex
>
> --
> <https://www.alejandro-colomar.es/>
--
<https://www.alejandro-colomar.es/>
Download attachment "signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.