musl - stdio fixes & internals documentation

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150529013937.GA32093@brightrain.aerifal.cx>
Date: Thu, 28 May 2015 21:39:37 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: stdio fixes & internals documentation

I'm writing this up because I need to fix a bug in ungetc -- it's
failing when the target FILE is at eof rather than clearing the eof
status and succeeding, despite having logic in the source that was
intended to make this case work right. So I'm going to go ahead and
make some tentative documentation of stdio internals for my own use
and for others who want to review these changes. This stuff can
eventually be cleaned up to go into the musl manual... :-)

So far, just for stuff used in reading:

f->read() assumes reading is currently a valid operation (so __toread
was called successfully without a subsequent __towrite) and the read
buffer is empty (but the values of the pointers don't matter). Right
now it's responsible for setting EOF and error flags for the file.
This is probably a bad idea because the logic, which is stdio-global
and not specific to the FILE type, is repeated in different underlying
FILE type implementations' read functions.

__toread() assumes the FILE is in a valid state to begin reading, but
does not misbehave horribly even if it was in the middle of writing.
If the FILE's fopen mode does not permit reading, it sets error status
and returns EOF. If the file is at EOF, it also returns EOF. This
means the caller (such as ungetc) cannot distinguish EOF from "error
going into reading mode" without temporarily clearing and restoring
the error flag (which would be ugly and inefficient). On failure
(either eof status or error switching to read mode) the buffer
pointers are left unchanged; on success, they're all set to the
beginning of the buffer.

The first thing I want to do is fix the known bug in ungetc, and I
think the easiest way to do that is to make __toread set valid read
buffer pointers when it fails due to eof status. Then, instead of
ungetc checking the return value of __toread, it can instead call
__toread and then just check rpos. That is, instead of:

	if ((!f->rend && __toread(f)) || f->rpos <= f->buf - UNGET) {
		// error

it can instead do:

	if (!f->rend) __toread(f);
	if (f->rpos <= f->buf - UNGET) {
		// error

or perhaps:

	if (!f->rpos) __toread(f);
	if (!f->rpos || f->rpos <= f->buf - UNGET) {
		// error

I like the second version better because it does not assume that a
null pointer compares <= any valid pointer, which could be wrong if
pointer <= is implemented as a signed comparison. I changed the
conditional from !f->rend to !f->rpos to simultaneously optimize two
code paths:

1. If f->rpos is initially non-null, the compiler can skip re-checking
   that it's non-null in the second if.

2. If f->rpos is initially null, then after __toread it needs to be
   re-read, but only one field (rpos) needs to be loaded from memory
   instead of two (rend and rpos).

Note that if the file is not open in a mode that permits writing, then
f->rpos is initially a null pointer and __toread will never set it to
something non-null, nor will f->read (since it's not permitted to call
f->read without a successful __toread).

Once these fixes are taken care of I'd like to look at the EOF logic
in f->read() and moving it out to the callers (only __uflow and fread)
where we won't have to worry about bugs (which I think exist) in the
FILE-type-specific read functions.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.