Date: Fri, 14 Sep 2018 12:24:09 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: string-backed FILEs mess On Fri, Sep 14, 2018 at 11:52:27AM -0400, Rich Felker wrote: > On Wed, Sep 12, 2018 at 01:18:24PM -0400, Rich Felker wrote: > > On Wed, Sep 12, 2018 at 12:33:45PM -0400, Rich Felker wrote: > > > OK, I've been properly initializing the FILE rather than leaving it > > > uninitialized except for the important fields like the old code did. > > > Changing that, it's 1.44s with step 8, 1.60s with step 24. I also > > > confirmed that this version of the code is almost as fast as the > > > existing code with the memchr removed (just assuming it can read > > > ahead). > > > > Uhg, the source of the "almost" here makes me even more convinced the > > current code must go. Part of the reason it's not as fast was that I > > was still setting f.read=__string_read, which requires (this is on > > i386, 32-bit) setting up the GOT pointer. > > > > What was the old code doing? f.read was uninitialized. But the new > > code crashes in that case when hitting the end of the string. Why > > doesn't the old code crash? Because f.rend is set way past the end of > > the string and never reached. If it were reached: > > > > 1. The shgetc macro calls the __shgetc function. > > 2. The __shgetc function calls __uflow. > > 3. __uflow calls __toread. > > 4. __toread inspects uninitialized f->wpos/f->wbase fields and, > > depending on the values it sees, calls f->write, which is also > > uninitialized. > > 5. If it gets past that, next __uflow calls the uninitialized f->read. > > > > The fact that any of this works at all is a fragile shit-show, and > > completely depends on __intscan/__floatscan just using (via shgetc) a > > tiny undocumented subset of the FILE structure and a tiny undocumented > > subset of the stdio interfaces on it. > > > > Really the existing code is just a poor substitute for having an > > abstraction for windowed string iteration, using the stdio FILE > > structure in a way that also works with real FILEs. It's clever, but > > this kind of clever is not a good thing. > > > > I'm still not sure what the right way forward is, though. > > OK, a small breakthrough that makes this mess a lot simpler: > > The __intscan and __floatscan backends do not (and are not allowed to, > but this should be documented and isn't) call any stdio functions on > the fake FILE* passed to them except for the shgetc and shunget > macros, defined as: > > #define shgetc(f) (((f)->rpos < (f)->shend) ? *(f)->rpos++ : __shgetc(f)) > #define shunget(f) ((f)->shend ? (void)(f)->rpos-- : (void)0) > > If the < is merely replaced by !=, which is functionally equivalent, > then shend can be any type of sentinel pointer we want (e.g. (void*)-1 > or even just &localvar) to use the buffer as a string with no known > length, and we have a guarantee that __shgetc is never called. > > I think this -1+2-byte change is an acceptable resolution to the issue > for now. Uhg, nope, mistake: they also use shcnt/shlim, which perform arithmetic on f->shend. This fix might still be salvagable, but not without significant additional work removing the dependency on invalid arithmetic. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.