Date: Fri, 15 Apr 2022 17:39:13 -0700 From: enh <enh@...gle.com> To: libc-coord@...ts.openwall.com Subject: Re: stdio_ext.h extensions for gnulib On Tue, Apr 12, 2022 at 9:00 PM Rich Felker <dalias@...c.org> wrote: > Early on in musl's history, we added a set of further extensions in > stdio_ext.h: > > size_t __freadahead(FILE *); > const char *__freadptr(FILE *, size_t *); > void __freadptrinc(FILE *, size_t); > void __fseterr(FILE *); > > The purpose of these functions was to provide a way for gnulib to do > the things they already insisted on doing, but without having access > to the FILE representation internals (which is how they implemented > these things for every other system at the time). > > The topic recently came up again via Toybox, where the author is not > using gnulib but was looking for some of the same functionality. I'm > told Bionic is on-board with adding these, apparently i shipped __fseterr() years ago (despite the fact that if you'd asked me last week, i'd have said "we only have the stuff glibc *and* musl both have"), and this week i was more convinced by __freadahead() than the other two. i was a bit concerned about how well the other two map to all extant stdio implementations, but it turns out i have a bit of a problem implementing __freadahead() too [and an existing bug or two i didn't previously know about]. it looks like musl is quite strict with ungetc() --- there's a fixed-size always-allocated unget buffer that's just before the actual buffer and only used if you try to ungetc at the start of the file? the BSD implementation in bionic instead has an "out of line" unget buffer that will grow arbitrarily large. which is problematic for messing with the read pointer unless i've misunderstood what those two functions actually do? i didn't find any documentation. for me it's a bit problematic for __freadahead() too because it turns out that POSIX's "The pushed-back bytes shall be returned by subsequent reads on that stream in the reverse order of their pushing" isn't exactly true for bionic. i think a subsequent getc() will return these characters, but i don't think a subsequent fread() would, for example. i've not seen any bug reports around this, so i guess the default "3 bytes is enough for anyone" buffer is actually enough in practice (or people who ungetc() only read via getc()). so i think my choices are either: 1. fix the arbitrary unget buffer, and have __freadahead() count characters in that, and give up on functions that touch the read pointer. 2. say "well, looks like no-one's using unlimited unget anyway" and use the musl "8 bytes is enough for anyone ... which means we can just have a slightly larger single buffer" trick, which also lets us implement the read pointer functions. i haven't had much time to look at this, so i'm also not sure how important the code to switch to a separate buffer if you ungetc() a _different_ byte to the one that was just returned. there's a comment there about sscanf() on const data ( https://android.googlesource.com/platform/bionic/+/master/libc/upstream-openbsd/lib/libc/stdio/ungetc.c#124 ). fwiw, i did run a quick test program on current macOS and glibc 2.33 and they both appear to allow unlimited amounts of unget. > and it might make sense to > coordinate this with glibc and other libc implementations/other > systems, or even propose these as some sort of standard. In the long > term, having these would allow systems to decouple gnulib from their > stdio internals, and eventually maybe even have the freedom to change > stdio internals if it ever makes sense to do so. > normally the demands of app compat would mean i probably wouldn't be able to change how the unget buffer works (just as much of the rest of stdio is effectively frozen), but given that the musl implementation lets you effectively pretend there's no such thing, and all reads are normal reads, i might be able to get away with that. (although it seems like sscanf() would have to change to _copy_, and i'm stuck if apps are actually relying on being able to unget arbitrary amounts of data, so i think my option #1 above is still the less implausible.) > Rich > Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.