Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <4b5wiz62y5pq5hiepuqwjt7gmfbu2lnz5nalxrocl2yzxowdx7@m6sus2kdvepw>
Date: Thu, 19 Jun 2025 13:31:06 -0500
From: Eric Blake <eblake@...hat.com>
To: Alejandro Colomar <alx@...nel.org>
Cc: Rich Felker <dalias@...c.org>, enh <enh@...gle.com>, 
	Florian Weimer <fweimer@...hat.com>, Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>, 
	musl@...ts.openwall.com, libc-alpha@...rceware.org, Joseph Myers <josmyers@...hat.com>, 
	наб <nabijaczleweli@...ijaczleweli.xyz>, Paul Eggert <eggert@...ucla.edu>, 
	Robert Seacord <rcseacord@...il.com>, Bruno Haible <bruno@...sp.org>, bug-gnulib@....org, 
	JeanHeyd Meneide <phdofthehouse@...il.com>, Thorsten Glaser <tg@...bsd.de>
Subject: Re: Re: BUG: realloc(p,0) should be consistent with malloc(0)

On Wed, Jun 18, 2025 at 09:04:02PM +0200, Alejandro Colomar wrote:
> Hi Rich, Elliott,
> 
> On Wed, Jun 18, 2025 at 12:35:50PM -0400, Rich Felker wrote:
> > On Wed, Jun 18, 2025 at 11:20:54AM -0400, enh wrote:
> > > On Tue, Jun 17, 2025 at 5:58 PM Alejandro Colomar <alx@...nel.org> wrote:
> > > >
> > > > Hi Elliott, Florian,
> > > >
> > > > glibc and Bionic are non-conforming to POSIX.1-2024.  The fix that we're
> > > > proposing would make them conforming.  Does conformance to POSIX.1-2024
> > > > mean something to you?
> > > 
> > > not when POSIX screwed up and made a change that made most of the
> > > existing implementations non-conformant, no. that sounds like a POSIX
> > > bug to me...
> 
> Not most.  Only two POSIX implementations, plus Windows.  And the
> solution is easy: fix the implementations.  There have been no
> regression reports in gnulib since we fixed it last year.

Speaking as someone who participated in the POSIX standardization
process, I'm trying to pinpoint exactly which statements of which
versions of which standards you are claiming as nonconformance.

First, a disclaimer: because this thread has been very vocal, I
brought the topic up in today's Austin Group meeting.  The members of
the group on the phone call remember _specifically_ trying to permit
existing glibc behavior (where realloc(p, 0) does NOT allocate), while
still jugging competing wording from the C standards, although we will
be the first to admit that we would not be surprised if the resulting
efforts are still not clear enough to be unambiguous.  I mentioned in
the meeting that I would attempt to follow up on these threads to see
what, if anything, the Austin Group may need to do to assist in the
discussion.

Next, my overarching question.  Is this about "realloc(non_null, 0)",
"realloc(NULL, 0)", or both?  As the two are very distinct, I want to
make sure we are talking about the same usage patterns.  For the rest
of this email, I'm assuming that your complaints are solely about
"realloc(non_null, 0)" - if I'm wrong, it may change the analysis done
below.

Now, on to some code archeology.  In today's glibc source code, I see
this telling comment in malloc/malloc.c, making it clear that glibc
folks are aware that realloc(non_null, 0) has two useful behaviors,
and that glibc picks the behavior that does NOT behave consistently
with malloc(0), because of back-compat guarantees:

/*
  The REALLOC_ZERO_BYTES_FREES macro controls the behavior of realloc (p, 0)
  when p is nonnull.  If the macro is nonzero, the realloc call returns NULL;
  otherwise, the call returns what malloc (0) would.  In either case,
  p is freed.  Glibc uses a nonzero REALLOC_ZERO_BYTES_FREES, which
  implements common historical practice.

  ISO C17 says the realloc call has implementation-defined behavior,
  and it might not even free p.
*/

More reading: https://www.austingroupbugs.net/view.php?id=400 shows
where earlier POSIX missed that C90 to C99 changed what was permitted
(and apparantly in a way to render glibc's implementation
non-conforming), and that's part of what drove the POSIX folks to ask
the C standard to improve the wording.  POSIX 2024 is based on C17,
but Nick Stoughton was regularly communicating between both C and
POSIX groups on what wording(s) were being floated around, in order to
try and make it so that glibc would not have to change behavior, but
at the same time trying to make it possible for applications to be
able to make wise runtime decisions on how to use realloc that would
not leak memory or risk dereferencing a NULL pointer if not careful.

https://sourceware.org/bugzilla/show_bug.cgi?id=12547 shows where
glibc has, in the past, refused to change behavior on the grounds that
the standards were buggy.  If the standards are still buggy, the best
course of action is to open a bug against them.

Also relevant are these documents
        https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2438.htm
        https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf

> 
> > > (like i said, i care greatly about actual shipping code. a standard is
> > > interesting for green-field stuff, but when it's at odds with reality
> > > it's often worse to try to adapt than just ignore the stupidity/report
> > > the bug and get it changed back.)
> 
> It's ironic that the standard should have never said that, because prior
> to the existence of ANSI C and POSIX, all existing systems behaved like
> the current POSIX specification.  It was a consequence of the horrible
> wording of the standards, that glibc was written so badly, by following
> a bogus specification, when it should have been made compatible with the
> existing systems.

POSIX was originally released in 1988, before C90.  glibc 1.0 came out
in 1992.  I am not sure when glibc first cared about whether trending
towards POSIX compliance mattered, although I do know that in the
early days, Ulrich would very adamently argue along the lines of
(paraphrased) "if the standards don't match common sense, then we
don't care about the standards".

> 
> Thus, this is a historical bug in ISO C, POSIX, which at least has been
> finally fixed in POSIX.

The fact that the wording has changed across multiple versions of C
and POSIX is indeed evidence that getting a specification that people
are happy with is difficult.  What is harder is the decision of
whether the bug is in the standard (for not documenting reality) or in
the implementations (for not doing what the standard rightfully
requests), or even both.  And "what people are happy with" differs on
who you ask - wording that permits disparate libc behavior is nicer to
the libraries (they don't have to change) but meaner to application
writers (the construct is not portable, so it is safer to avoid the
construct altogether rather than worry about which libraries have
which behaviors); whereas wording that locks down behavior is nicer to
applications (if I write this, it should work regardless of platform,
and if it doesn't, the standard exists as leverage to get libc fixed)
but meaner to libraries (forcing the library to version its symbols to
change behavior for newer standards while still providing ABI
stability guarantees to older apps that depend on the old behavior is
not cheap).

The sad fact of the matter is that _because_ it there are so many
differences in opinions, the C23 action of making realloc(p,0)
undefined is probably the simplest course that could be agreed on
(don't ever do that in your code, because you can't guarantee the
results), but simultaneously annoying to end users (because it is
undefined, rather than implementation-defined or unspecified, a
compiler can "optimize" your code to do WHATEVER IT WANTS - which
really means you CANNOT ever reliably call realloc(p,0) if your
compiler is aiming for C23).

On top of that, the POSIX standard usually defers to a (fixed version)
of C, but does have the liberty to impose well-defined behavior even
where the corresponding C standard left things undefined (for example,
POSIX 2017 was able to demand that a POSIX system can cast function
pointers to and from void* in order to implement dlsym(), even though
C99 said that was undefined).  Put another way, just because C23 has
changed realloc(p,0) to be undefined does NOT require a future version
of POSIX to do likewise when it finally moves to a newer C than C17.
But at the same time, POSIX is unlikely to make things strict if it
risks alienating existing implementations; if glibc changes behavior,
that would go a long way towards POSIX changing wording to be
stricter.

> 
> BTW, the same text is present in POSIX.1-2017.  It was changed in a TC,
> following bug <https://www.austingroupbugs.net/view.php?id=400>.
> 
> The motivation, from what I can read there, seems to be that C99 already
> made POSIX.1 non-conforming, and this fix was intended to conform to
> C99.
> 
> Indeed, glibc is non-conforming to C99 too.  Although, I don't like the
> wording from C99, either; it allows weird stuff: it allows an
> implementation where malloc(0) returns NULL and realloc(p,0) non-null
> (so, the opposite of glibc).
> 
> C11 is essentially identical to C99 in that regard, so glibc is also
> non-conforming to C11.
> 
> C17 changed to something very weird.  It seems to me that glibc is
> conforming again to C17, but it also seems to me that it's impossible to
> write code that uses realloc(p,0) in a portable way with this
> specification.  I think it's a good thing that C23 removed that crap.
> 
> 
> Here's a summary of conformance to standards:
> 
> 	glibc conforms to:
> 		-  SysVr4
> 		-  ISO C89

I don't (currently) have a copy of C89 handy in front of me to quote
chapter and verse for this one, beyond what was already quoted in
Austin Group bug 400.

> 		-  ISO C17

This one I _can_ quote.  7.22.3.4:

"If size is zero and memory for the new object is not allocated, it is
implementation-defined whether the old object is deallocated."

glibc documents that "realloc(non_null, 0)" deallocates non_null.
Therefore it is compliant.  But that wording is still unfriendly to
users - there is no way to programmatically query the runtime what
behavior the implementation defined.

> 		-  XPG4
> 
> 	glibc doesn't conform to:
> 		-  SysIII
> 		-  SysV
> 		-  SysVr2
> 		-  SysVr3
> 		-  SVID Issue 2
> 		-  SVID Issue 3
> 		-  The X/Open System V Specification
> 		-  ISO C99

Here, section 7.20.3.4 is relevant. In there, I see wording "If ptr is
a null pointer, the realloc function behaves like the malloc function
for the specified size." but NO wording about when ptr is non-null but
size is 0.  As best I can tell, silence on the part of C99 means that
the standard is unspecified, and therefore glibc can do whatever it
wants and still claim compliance.  But I'm open to correction if you
can quote the exact statement for why you claim glibc is non-compliant
here.

> 		-  ISO C11

This wording appears to match C99.

> 		-  POSIX.1-2001

This one defers to C89 anywhere that it is not explicitly documenting
with CX shading.  It adds CX shading to document the use of
errno=ENOMEM on allocation failures, but otherwise omits shading when
it states:

"If size is 0 and ptr is not a null pointer, the object pointed to is
freed."

which sounds like glibc behavior.  But without double-checking C89, it
is hard to say whether POSIX accidentally diverged from C89 in
allowing glibc as compliant.

> 		-  POSIX.1-2008

This version of POSIX defers to C99, but still states "If size is 0
and ptr is not a null pointer, the object pointed to is freed."
without CX shading, even though C99 does NOT have the same wording as
C89.  You could argue that this statement should be ignored since it
lacks CX shading and does not match any statement in C99.  But even
so, unless you can demonstrate chapter-and-verse how glibc fails to
comply with C99, you also have a hard time convincing me that glibc
does not comply with POSIX 2008.  And this issue was why Austin Group
bug 400 was created.

No mention of POSIX.1-2013?  But just in case you're keeping track,
that is the version where Bug 400 was applied, and the text changed
to: "If the size of the space requested is zero, the behavior shall be
implementation-defined: either a null pointer is returned, or the
behavior shall be as if the size were some non-zero value, except that
the returned pointer shall not be used to access an object."  But it
also has the problem that it requires "If size is 0, either: A null
pointer shall be returned <CX>and errno set to an
implementation-defined value</CX>. ..."

which glibc does NOT comply with.  realloc(non_null,0) returns NULL
_without_ setting errno, precisely because it DID free the object
successfully.  This requirement in POSIX 2013 is an explicit extension
not mentioned in C99, AND it was quickly pointed out that it forbids
glibc behavior, so:

> 		-  POSIX.1-2017

This one additionally applies Bug 526 and 688, to try and clean up
wording differences from C99, in particular clarifying whether errno
has to be set when "realloc(non_null, 0)" frees a pointer:

https://www.austingroupbugs.net/view.php?id=526
https://www.austingroupbugs.net/view.php?id=688

where the wording is once again relaxed to "If the size of the space
requested is zero, the behavior shall be implementation-defined:
either a null pointer is returned, or the behavior shall be as if the
size were some non-zero value, except that the behavior is undefined
if the returned pointer is used to access an object. ... If size is 0,
either: A null pointer shall be returned <CX>and, if ptr is not a null
pointer, errno shall be set to an implementation-defined value</CX>."

which should once again allow glibc to be deemed compliant.  At the
same time, the Austin Group was trying to get C17 fixed; that fix
turned out to be ugly, so the C committed tried again in C23.


> 		-  POSIX.1-2024

Here, the standard defers to C17 rather than C99, but adds a lot more
CX shading.  Given the changes between C99 and C17, POSIX tried to
match.  Unfortunately, the DESCRIPTION section lost any mention of
non_null pointer plus 0 size, leaving only the RETURN VALUE secion,
which now uses the wording entirely in CX shading:

"If size is 0, or either nelem or elsize is 0, either: • A null
pointer shall be returned and, if ptr is not a null pointer, errno
shall be set to [EINVAL]. • A pointer to the allocated space shall be
returned, and the memory object pointed to by ptr shall be freed. The
application shall ensure that the pointer is not used to access an
object."

Despite the efforts of the Austin Group to not break back-compat, that
one clearly looks like glibc is not compliant (glibc returns NULL and
does NOT set errno to EINVAL).  And if I recall the conversations, we
knew at the time of POSIX 2024 that C23 would be marking
realloc(non_null, 0) as undefined behavior, and wanted to capture that
directly rather than depending on C17, but we may have failed in our
efforts.

> 
> Conformance to POSIX.1-2001 and POSIX.1-2008 is not clear.  While glibc
> conforms to the wording of these standards, these standards have the
> following header in the realloc(3) specification:
> 
> 	The functionality described on this reference page is aligned
> 	with the ISO C standard.  Any conflict between the requirements
> 	described here and the ISO C standard is unintentional.  This
> 	volume of IEEE Std 1003.1-2001 defers to the ISO C standard.
> 
> Which means that POSIX's permissive wording is unintentional, and the
> ISO C99 wording is the one that matters, so glibc is non-conforming.

The conflicts are unintentional only when <CX> shading is not
explicitly present.

> 
> (I didn't mention C23, since it's UB, so anything conforms.)
> 
> 
> Have a lovely day!
> Alex
> 
> -- 
> <https://www.alejandro-colomar.es/>

If you've managed to make it this far, congratulations.  We probably
still need to open bugs against POSIX to have POSIX-2024-TC1 improve
any ambiguous wording, and taking into account whatever the C
committee may decide to do with Alejandro's proposals for post-C23
behaviors, and whether glibc is willing to make realloc(non_null, 0)
allocate in the same manner as malloc(0) rather than being a hidden
call to free().

I don't know if I answered all of your questions, or raised even more,
but you have your work cut out for you before declaring the man pages
good enough.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.