![]() |
|
Message-ID: <3igvwk75ofugax53xmbohjp7vbr27pkwivlw6apftaf7haocse@nsszgukhkxzu> Date: Fri, 20 Jun 2025 11:30:59 -0500 From: Eric Blake <eblake@...hat.com> To: Alejandro Colomar <alx@...nel.org> Cc: Rich Felker <dalias@...c.org>, enh <enh@...gle.com>, Florian Weimer <fweimer@...hat.com>, Adhemerval Zanella Netto <adhemerval.zanella@...aro.org>, musl@...ts.openwall.com, libc-alpha@...rceware.org, Joseph Myers <josmyers@...hat.com>, наб <nabijaczleweli@...ijaczleweli.xyz>, Paul Eggert <eggert@...ucla.edu>, Robert Seacord <rcseacord@...il.com>, Bruno Haible <bruno@...sp.org>, bug-gnulib@....org, JeanHeyd Meneide <phdofthehouse@...il.com>, Thorsten Glaser <tg@...bsd.de> Subject: Re: Re: BUG: realloc(p,0) should be consistent with malloc(0) On Fri, Jun 20, 2025 at 01:37:58AM +0200, Alejandro Colomar wrote: > Hey Eric! > > Thanks a lot for the detailed reply! Comments below. Ditto. > > Now, on to some code archeology. In today's glibc source code, I see > > this telling comment in malloc/malloc.c, making it clear that glibc > > folks are aware that realloc(non_null, 0) has two useful behaviors, > > and that glibc picks the behavior that does NOT behave consistently > > with malloc(0), because of back-compat guarantees: > > > > /* > > The REALLOC_ZERO_BYTES_FREES macro controls the behavior of realloc (p, 0) > > when p is nonnull. If the macro is nonzero, the realloc call returns NULL; > > otherwise, the call returns what malloc (0) would. In either case, > > p is freed. Glibc uses a nonzero REALLOC_ZERO_BYTES_FREES, which > > implements common historical practice. > > > > ISO C17 says the realloc call has implementation-defined behavior, > > and it might not even free p. > > */ > > That comment is wrong. "common historical practice" is that realloc(3) > is consistent with malloc(3). This is true since the days of Unix V7. Careful. "common historical practice" can just as easily be read along the lines of "common historical practice of glibc, independent of other implementations" (since glibc does not inherit a code base from other implementations). And "historical" also doesn't give a time frame; if the comment was only written at a time where glibc behavior had already been in place for many years, that could be impetus for using "historical", even if it disregards even more history pre-glibc from other implementatinos. > I don't know what they were referring to. Maybe the behavior introduced > in SysVr2's -lmalloc which was later standardized in the SVID by AT&T? > That was never common, since all existing default (-lc) realloc(3) > implementations behaved as if realloc(p, 1). You had to use the > -lmalloc library to get it to return NULL and free the object. At the time I wrote my mail, I had not researched when glibc made the change. Paul's additional research proving that glibc 1.0 behaves differently than glibc 2.2, and that glibc 2.2 changed behavior because of a wording change in the C99 draft (whether or not that change was strictly necessary, and whether or not the actual C99 required what glibc claimed it required by making the behavior change), throws more fuel onto the fire. Checking the provenance of that comment in glibc, I've further determined: malloc/malloc.c was rewritten in Jan 2002 by Wolfram Gloger (committed by Ulrich) to borrow ideas from Doug Lea's malloc-2.7.0.c, in commit fa8d436c (glibc-2.3). - REALLOC_ZERO_BYTES_FREE was defined 1 at that time, with this comment: /* REALLOC_ZERO_BYTES_FREES should be set if a call to realloc with zero bytes should be the same as a call to free. This is required by the C standard. Otherwise, since this malloc returns a unique pointer for malloc(0), so does realloc(p, 0). */ - given the timing, C99 would have been the current C standard Before that rewrite, the feature switch macro was merely defined (rather than set to 1) as of commit 7c2b945e (April 1999, glibc 2.1.1), the commit by Andreas that Paul already mentioned, and directly in response to C99 drafting efforts (remember, C99 was not formally adopted until May 2000); I am less certain without more research whether the wording that Andreas was referring to at the time the feature knob was switched made it into the approved C99 unchanged, or if that was still undergoing debates in the C committee. The comment at that time read: /* REALLOC_ZERO_BYTES_FREES should be set if a call to realloc with zero bytes should be the same as a call to free. Some people think it should. Otherwise, since this malloc returns a unique pointer for malloc(0), so does realloc(p, 0). */ and Worlfram Gloger reworded the comment again in 431c33c0 (May 1999, also glibc 2.1.1), as part of synchronizing with ptmalloc: /* REALLOC_ZERO_BYTES_FREES should be set if a call to realloc with zero bytes should be the same as a call to free. The C standard requires this. Otherwise, since this malloc returns a unique pointer for malloc(0), so does realloc(p, 0). */ So, the switch from "Some people think it should" to "The C standard requires this" was independent from the change from actually flipping the knob, under different authors, but both changes happened in close proximity, and the list archives from that time are relevant to the thinking as to why it was assumed C99 required the change. Going even further in time, the knob itself (although disabled) appears to have been around since at least commit f65fd747 (Dec 1996, importing glibc history into version control, glibc 2.0.4; glibc.git lacks older history), where it was disabled with documentation of: REALLOC_ZERO_BYTES_FREES (default: NOT defined) Define this if you think that realloc(p, 0) should be equivalent to free(p). Otherwise, since malloc returns a unique pointer for malloc(0), so does realloc(p, 0). so long before C99 was even close to final, glibc was aware of other implementations having that behavior enough to offer it as a compile-time knob even if not using it back then. ChangeLog.1 at that time (now living in ChangeLog.old/ChangeLog.1) mentions the existence of malloc/malloc.c in a change by Roland McGrath in April 1992 (that's the oldest mention of the file within accessible glibc sources, but no actual view of what the file looked like at the time). The first mention of REALLOC_ZERO in ChangeLog.old/ChangeLog.* is in ChangeLog.10, corresponding to changes in 2000 (and already covered above in this email). I lack the resources to see how glibc evolved before the commit history currently tracked in git, which would be needed if we want to learn how glibc 1.0 behaved, or when that comment may have first been added. Going in the other direction, Paul modified the comment, in commits dff9e592 and 9f1bed18 (both in April 2021, glibc-2.34): - first attempt /* REALLOC_ZERO_BYTES_FREES controls the behavior of realloc (p, 0) when p is nonnull. If nonzero, realloc (p, 0) should free p and return NULL. Otherwise, realloc (p, 0) should do the equivalent of freeing p and returning what malloc (0) would return. ISO C17 says the behavior is implementation-defined here; glibc follows historical practice and defines it to be nonzero. */ - current wording /* The REALLOC_ZERO_BYTES_FREES macro controls the behavior of realloc (p, 0) when p is nonnull. If the macro is nonzero, the realloc call returns NULL; otherwise, the call returns what malloc (0) would. In either case, p is freed. Glibc uses a nonzero REALLOC_ZERO_BYTES_FREES, which implements common historical practice. ISO C17 says the realloc call has implementation-defined behavior, and it might not even free p. */ In that light, 2021 is far enough removed from 2002 that Paul's use of "historical" could easily be read in terms of glibc history (independent of other implementations). And this corresponds to the point in time after the Austin Group had pointed out to the C committee that C99 wording and POSIX 2008 were possibly at odds, such that POSIX was trying to get the C wording improved before C17 was out so that future POSIX (now POSIX 2024) would be easier. > > See <https://nabijaczleweli.xyz/content/blogn_t/017-malloc0.html>. Thank you for starting this. It will probably need some revisions before it is ready for the C committee, but hopefully this thread helps. > > > More reading: https://www.austingroupbugs.net/view.php?id=400 shows > > where earlier POSIX missed that C90 to C99 changed what was permitted > > (and apparantly in a way to render glibc's implementation > > non-conforming), and that's part of what drove the POSIX folks to ask > > the C standard to improve the wording. POSIX 2024 is based on C17, > > but Nick Stoughton was regularly communicating between both C and > > POSIX groups on what wording(s) were being floated around, in order to > > try and make it so that glibc would not have to change behavior, but > > at the same time trying to make it possible for applications to be > > able to make wise runtime decisions on how to use realloc that would > > not leak memory or risk dereferencing a NULL pointer if not careful. > > > > https://sourceware.org/bugzilla/show_bug.cgi?id=12547 shows where > > glibc has, in the past, refused to change behavior on the grounds that > > the standards were buggy. If the standards are still buggy, the best > > course of action is to open a bug against them. > > All standards since C89 have been buggy. If you are pedantic reading > C89, the BSDs and all the historic implementations back to the original > Unix V7 are non-conforming: > > <https://port70.net/~nsz/c/c89/c89-draft.html#4.10.3.4> > > Which says: > > | If size is zero and ptr is not a null pointer, the object it points to > | is freed. > > It's not clear whether this means that the whole action of realloc(p,0) > is to free(3) the pointer, or if it can also allocate a new object. > Under the former interpretation, the standard is at odds with reality. > Under the latter interpretation, I'd interpret it as saying that > realloc(p,0) cannot fail (and thus must free(p)), which would be an > interesting guarantee. I guess we'll never know what was the intended > reading. C89 also says (4.10.3): "If the size of the space requested is zero, the behavior is implementation-defined; the value returned shall be either a null pointer or a unique pointer." Putting those two sentences together, I can make a very strong case that an implementation where "realloc(p,0)" frees p, and then returns NULL, and documents that it does so, complies (the old object pointed to by p is free, and the size being zero means that the new object being NULL rather than a distinct pointer to non-dereferenceable storage is what the implementation documented); and this is true whether or not malloc(0) and realloc(p,0) differ on whether they return NULL, as long as both of them document their behavior on zero size. Another observation on C89 - it has different wording in most places about "if the space has been deallocated by a call to the free or realloc function"; where free() documents that "The free function causes the space pointed to by ptr to be deallocated, that is, made available for further allocation." The phrase "is deallocated" is thus precisely defined. However, the only use of the phrase "is freed" in that document is the one sentence you quoted about realloc with non-null pointer and zero size. I can argue that as an undocumented term, "is freed" is NOT intended to be synonymous with "is free()d", and is instead distinct in meaning from "is deallocated" (ie. "is deallocated" is how a pointer can be reused by a future malloc, and is no longer a distinct memory location; but "is freed" could be defined as contents are no longer referenceable but the pointer might still be allocated as a distinct location in memory and still safe to pass to a later "free()"). But then, you might ask, why does the standard talk about space that "has been deallocated by a call to the free or realloc function" if "realloc(p,0)" is not deallocating? My answer: there is another case where it is obvious that realloc does deallocation: namely, when realloc(p,non-zero) returns a new pointer that was (presumably larger) than the contents of the old p - the old value of p "was deallocated" and can now be reused by a future malloc. With that definition in hand, I can now argue that whether realloc(p,0) returns p (truncated and freed of its contents, but p is still allocated), or returns a new non-NULL pointer (p was freed of its contents AND deallocated in order to return the new pointer), an implementation where realloc(p,0) returns a non-NULL pointer has successfully freed p. Strenuous logic, perhaps, but we're already in the weeds. So, I think we can make arguments that ALL of the following can be considered compliant under C89 rules (although the argument is easier for some cases than others): 1. malloc(0) returns non-NULL, realloc(0,0) returns non-NULL, realloc(p,0) returns p [free(p) is still safe, but it is no longer safe to access contents of p] 2. malloc(0) returns non-NULL, realloc(0,0) returns non-NULL, realloc(p,0) returns non-NULL other than p [free(p) is unsafe] 3. malloc(0) returns NULL [0-sized objects are unsupported; presumably errno=EINVAL but C89 is silent on that], realloc(0,0) returns NULL, realloc(p,0) deallocates p and then returns NULL [presumably with errno set, at any rate free(p) is unsafe] 4. malloc(0) returns NULL, realloc(0,0) returns NULL, realloc(p,0) returns p unchanged [free(p) is still safe, but dereferencing its contents is no longer safe] 5. malloc(0) returns non-NULL, realloc(0,0) returns non-NULL, realloc(p,0) deallocates p and returns NULL [free(p) is unsafe] Of those, it looks like glibc 2.1 would be case 1 (the same pointer is returned, but truncated down to minimum size), glibc 2.2 to present would be case 5 (the pointer is freed, the function returns NULL even though it inconsistent with malloc(0) being able to return zero-sized objects), and other traditional implementations could be either case 2 (a non-NULL pointer is returned because zero-sized objects are always possible, but because it was distinct from p it also met the rule about having freed p) or case 1 (the call "freed" the contents of p, but did not deallocate it). > > C99 changed the specification, probably because of how ambiguous it was. > > glibc was also buggy, as it differed from every other Unix-like system. > All Unix systems behaved as if free(p) and malloc(n). glibc is the only > one that didn't follow this obvious consistency rule. Maybe the intended wording was that "if realloc(p,s) returns a non-NULL value distinct from p, then p was deallocated and the new value obtained as if by malloc(n)". After all, the reason realloc() exists is for the cases where realloc(p,s) can return p (ie. resized in place, whether by truncating and optionally handing back an unused tail to the system, or by expanding where the new tail was already accessible in place even though it has unspecified contents); it's only when the resize-in-place can't happen that realloc() must then arrange to copy contents from the old pointer to the new. In fact, even though portable code must not expect specific contents in the new pointer if the sequence free(p);malloc(s) happens to reuse p, I could totally see how some (possibly-older) implementations of malloc() have sufficient locking in place where it may be easier to try and resize pointer p by free(p)malloc(s) and only if the resize changed locations then do the copying - as long as the rest of the application can't corrupt the contents in the old location before the new location is finally returned to the user. But even if it was the intended wording (or if that is the wording that we hope to have in place in the future), unfortunately it is not the actual wording. > > So, both are bogus. > > > Also relevant are these documents > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2438.htm > > https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf > > > > > > > (like i said, i care greatly about actual shipping code. a standard is > > > > > interesting for green-field stuff, but when it's at odds with reality > > > > > it's often worse to try to adapt than just ignore the stupidity/report > > > > > the bug and get it changed back.) > > > > > > It's ironic that the standard should have never said that, because prior > > > to the existence of ANSI C and POSIX, all existing systems behaved like > > > the current POSIX specification. It was a consequence of the horrible > > > wording of the standards, that glibc was written so badly, by following > > > a bogus specification, when it should have been made compatible with the > > > existing systems. > > > > POSIX was originally released in 1988, before C90. glibc 1.0 came out > > in 1992. I am not sure when glibc first cared about whether trending > > towards POSIX compliance mattered, although I do know that in the > > early days, Ulrich would very adamently argue along the lines of > > (paraphrased) "if the standards don't match common sense, then we > > don't care about the standards". > > It seems Ulrich didn't follow that in this case. I don't know who wrote > the original realloc(3) in glibc. Was it RMS? It would be interesting > to know how they came up with that implementation. If anyone knows who > wrote it and why, please CC them. > > I don't have a copy of POSIX.1-1988, nor of any other POSIX.1 before > POSIX.1-2001. What do they say for realloc(3)? > > > > Thus, this is a historical bug in ISO C, POSIX, which at least has been > > > finally fixed in POSIX. > > > > The fact that the wording has changed across multiple versions of C > > and POSIX is indeed evidence that getting a specification that people > > are happy with is difficult. What is harder is the decision of > > whether the bug is in the standard (for not documenting reality) or in > > the implementations (for not doing what the standard rightfully > > requests), or even both. And "what people are happy with" differs on > > who you ask - wording that permits disparate libc behavior is nicer to > > the libraries (they don't have to change) but meaner to application > > writers (the construct is not portable, so it is safer to avoid the > > construct altogether rather than worry about which libraries have > > which behaviors); whereas wording that locks down behavior is nicer to > > applications (if I write this, it should work regardless of platform, > > and if it doesn't, the standard exists as leverage to get libc fixed) > > but meaner to libraries (forcing the library to version its symbols to > > change behavior for newer standards while still providing ABI > > stability guarantees to older apps that depend on the old behavior is > > not cheap). > > As can be seen from the change in gnulib, the only possible issues from > migrating from the current glibc behavior to the musl behavior is a few > leaks in cases where the programmer calls realloc(p,0) ignoring the > return value. Those leaks would leak 0 bytes plus the metadata. > > A solution for those leaks would be to add a diagnostic for calls to > realloc(3) where the return value is unused. And even if those aren't > fully solved, they're leaks of a few bytes. There's nothing that should > cause real issues. > > But the glibc maintainers mentioned that they're investigating about it > in distros, so I guess we'll eventually have the results of their > investigation. > > > The sad fact of the matter is that _because_ it there are so many > > differences in opinions, the C23 action of making realloc(p,0) > > undefined is probably the simplest course that could be agreed on > > (don't ever do that in your code, because you can't guarantee the > > results), but simultaneously annoying to end users (because it is > > undefined, rather than implementation-defined or unspecified, a > > compiler can "optimize" your code to do WHATEVER IT WANTS - which > > really means you CANNOT ever reliably call realloc(p,0) if your > > compiler is aiming for C23). > > Indeed. I think the move from C17 to C23 was good. > > The issue with C17 is that it is very similar to POSIX.1-2008, but since > ISO C doesn't require that errno is set when the pointer is not freed, > it is impossible to portably determine if the input pointer was freed > after realloc(p,0). This is not an issue in POSIX.1, though, since it > can and does require that errno is set if the input pointer is not > freed. > > Because it was impossible to determine whether r(p,0) has freed p after > returning NULL in C17, it was effectively UB. So, I consider C23 to be > a minor change from C17, and one which clarifies that it is UB, because > it already was before. > > POSIX.1 is not limited by this limitation of ISO C. > > > On top of that, the POSIX standard usually defers to a (fixed version) > > of C, but does have the liberty to impose well-defined behavior even > > where the corresponding C standard left things undefined (for example, > > POSIX 2017 was able to demand that a POSIX system can cast function > > pointers to and from void* in order to implement dlsym(), even though > > C99 said that was undefined). Put another way, just because C23 has > > changed realloc(p,0) to be undefined does NOT require a future version > > of POSIX to do likewise when it finally moves to a newer C than C17. > > But at the same time, POSIX is unlikely to make things strict if it > > risks alienating existing implementations; if glibc changes behavior, > > that would go a long way towards POSIX changing wording to be > > stricter. > > Indeed; I think POSIX.1 doesn't need to make this undefined, and > shouldn't. > > > > BTW, the same text is present in POSIX.1-2017. It was changed in a TC, > > > following bug <https://www.austingroupbugs.net/view.php?id=400>. That one mentions WG14 document N872 paragraph 19.c as the change from C89 to C99; and it matches what I see in the final C99 document in 7.20.3.4: 19c. realloc rewrite (Meyers) The *realloc* function deallocates the old object pointed to by *ptr* and returns a pointer to a new object that has the size specified by *size*. The contents of the new object shall be the same as the old object before deallocation up to the lesser of the size of the old object and *size*. Any bytes in the new object beyond the size of the old object have indeterminate values. If *ptr* is a null pointer, the *realloc* function behaves like the *malloc* function for the specified size. Otherwise, if *ptr* does not match a pointer earlier returned by the *calloc*, *malloc*, or *realloc* function, or if the space has been deallocated by a call to the *free* or *realloc* function, the behavior is undefined. If memory for the new object cannot be allocated, the old object is not deallocated and its value is unchanged. Returns The *realloc* function returns a pointer to the new object, which may have the same value as *ptr*, or a null pointer if the new object could not be allocated. C99 also has the disclaimer in 7.20.3, changed slightly in wording from the disclaimer in C89, that: "If the size of the space requested is zero, the behavior is implementation-defined: either a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object." > > > > > > The motivation, from what I can read there, seems to be that C99 already > > > made POSIX.1 non-conforming, and this fix was intended to conform to > > > C99. Or worded differently in light of the above: C89 had a strong requirement about p "is freed" after realloc(p,0), which was confusing in itself, so C99 tried to change the wording to get rid of the undefined phrase, and make it clear that the old object is deallocated and the new object (even if it has the same value of p as the old object) is allocated and has initial contents that match the old object (whether the implementation can optimize by resizing in place without copying is not observable from the caller's perspective, and the C99 mandating that realloc() always produces a new object on success, even if the new pointer is the same as the old, makes lifetime analysis elsewhere in the standard easier). In fact, I'm almost inclined to say that it was C89's wording (and not C99's) that was the reason that glibc flipped their default to having realloc(p,0) return NULL (because it was C89's wording that made it into POSIX); and it was C99's debate on newer wording that brought the issue to light. And yet, here we are, STILL trying to get better wording into both C and POSIX. > > > > > > Indeed, glibc is non-conforming to C99 too. Although, I don't like the > > > wording from C99, either; it allows weird stuff: it allows an > > > implementation where malloc(0) returns NULL and realloc(p,0) non-null > > > (so, the opposite of glibc). > > > > > > C11 is essentially identical to C99 in that regard, so glibc is also > > > non-conforming to C11. Here, I'm inclined to argue the opposite: glibc IS compliant with C99 and C11, and the commit history in glibc shows that the change to have realloc(p,0) return NULL was made at the time of C99 on the grounds of a compliance argument, even if it might have been misguided. It was C89, not C99, that explicitly required p to be freed; and C99 was clarifying that the old object is deallocated before the new object (if any) is returned, even if the pointer is the same. And again, it stems back to the fact that C says it is implementation-defined whether a size of 0 returns NULL or a distinct pointer, and has no requirements on errno being set. Presumably, as long as you are willing to set errno=0, call realloc(p,0), and then on NULL check if errno==EINVAL (p is still valid) or still unset (p was freed), then glibc's implementation complies, even though it does not match historical behavior of either the implemenations where malloc(0) always fails (a zero-sized object is not possible) nor the implementations where realloc(p,0) always returns non-NULL. POSIX then tried to add the rules to be able to distinguish between NULL meaning success and being an EINVAL error (since C99 didn't). I'm out of time today to reply to anything later in your email. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.