Date: Fri, 29 May 2015 09:53:00 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Making stdio writes robust/recoverable under errors Currently, musl's stdio write operations take the liberty of leaving the current position and actual contents written rather unpredictable if a write error occurs. Originally the motivation was probably a mix of uncertainty about what to do and lack of code (or a desire to avoid complexity) for tracking what part of the buffer was unwritten if a write error occurred before the buffer was written out. But commit 58165923890865a6ac042fafce13f440ee986fd9, as part of adding cancellation support, added the code to track what part of the buffer remains unwritten, and therefore, from what I can see, avoiding data loss when stdio writes fail transiently (like EINTR, for instance) is simply a matter of removing line 31 in the error path of __stdio_write: f->wpos = f->wbase = f->wend = 0; so that the buffer is not thrown away. An attached test program demonstrates the issue. It assumes Linux 64k pipe buffers so it's not appropriate for inclusion in tests in its current form, but it gets the job done where it works. It attempts to send 128k over a pipe via write followed by a series of fwrite retries that are interrupted by signals that cause EINTR; thanks to the kernel's pipe buffer semantics, there's basically no timing/race aspect involved. With current musl, I get only 130560 across the pipe; with the above line removed, I get all 131072. I think this is a good change to make from the standpoint of robustness. Even if most errors are not recoverable, some may be; in fact ENOSPC could be handled meaningfully if the calling program knows it has temp files it could delete. What I am concerned about, though, are residual assumptions that, after f->write, the write buffer will be clear. With this change, fflush and similar operations would no longer discard the buffer on error, meaning we would need to check that callers don't need that property. As another example, fseek fails when the buffer can't be written out, but since the current buffer write throws away the buffer on failure, a second fseek call will succeed; this change would leave it failing. (As an aside, rewind already fails in this case and throws away the error; this is almost certainly wrong. It should probably attempt to flush and throw away the buffer on failure before seeking so that the seek always succeeds.) I don't want to make these changes right now at the end of a release cycle but I might promote them to important targets for the next cycle. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.