musl - stdio [de]merits discussion [Re: possible getopt stderr output changes]

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20141211175156.GY4574@brightrain.aerifal.cx>
Date: Thu, 11 Dec 2014 12:51:56 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: stdio [de]merits discussion [Re: possible getopt stderr
 output changes]

On Thu, Dec 11, 2014 at 04:40:15PM +0100, Laurent Bercot wrote:
> On 11/12/2014 07:44, Rich Felker wrote:
> >Is there a reason behind this? On my build, the printf core is ~6.5k
> >and the other parts of stdio you might be likely to pull in are under
> >2k. I'm happy to take your opinion into consideration but it would be
> >nice to have some rationale.
> 
>  6.5k, or even 8.5k, is not much in the grand scale of things, but it's
> about the ratio of useful pulled in code / total pulled in code, which
> I like to be as close as possible to 1. And stdio tanks that ratio,
> see below. The modest size of the printf code is a testimony to the
> efficiency of the musl implementation, not to the sanity of the
> interface.
> 
> 
> >Personally I find stdio a lot more reasonable than getopt.
> 
>  I dislike stdio for several reasons:
> 
>  - The formatting engine is certainly convenient, but it is basically

I like it because in all but the tiniest programs, you end up needing
this kind of functionality, and whenever somebody rolls their own,
it's inevitably 10x to 100x larger and uglier than musl's printf core.

> a runtime interpreter, which has to be entirely pulled in as soon as
> there's a format string, no matter how simple the formatting is.
> (Unless compilers perform specific static analysis on format strings
> to know which part of the interpreter they have to pull, but I doubt
> this is the case; gcc magically replaces printf(x) with puts(x) when
> x is devoid of format operations, and it is ugly enough as is.)
> That means I have to pull in the formatting code for floating point
> numbers, even if I only handle integers and strings; I have to pull in
> the code for edge cases of the specification, including the infamous
> "%n$" format, even if I never need it; I have to pull in varargs even
> if I only do very regular things with a fixed number of arguments.
> Most of the time I just want to print a string, a character, or an
> integer: being able to do this shouldn't add more than 2k to my
> executable, at most.

Of all that, the only thing contributing non-trivial size is floating
point support.

>  - The FILE interface is not by any mesure suited to reliable I/O.

This is certainly true.

>  When printf fails, there's no way to know how many bytes have been
> written to the descriptor.

For seekable files, ftello can tell you. Generally I agree with this
reasoning, that stdio is not the right tool for working in-place on
valuable files. But it's perfectly usable for producing new output in
cases where all write errors will simply result in failing the whole
"make a file" operation.

> Same with fclose: if it fails, and the
> buffer was not empty, there's no way to know if everything was written.

This is solved by fflush before fclose.

> Having the same structure for buffered (stdout) and unbuffered (stderr)
> output is unnecessarily confusing; and don't get me started on buffered
> input, the details of which users have exactly zero control over. FILE

Stdio read operations should not block unless more data is needed to
satisfy the actual request the application is making. If they do it's
an implementation bug. Of course it's not usable with select/poll
loops because you can't see if there's data already in the buffer. GNU
software (gnulib in particular) likes to ignore this problem by poking
at internals; we gave them an alternate solution with musl a couple
years back just to avoid this. :(

> is totally unusable for asynchronous I/O, which is 99% of what I do;

For event-driven models, yes. For threaded models, it's quite usable
and IMO it simplifies code by a a larger factor than the size it adds,
in cases where it's sufficient.

> it's just good enough to write error messages to stderr, where you don't
> need accurate reporting - in which case you can even do without stdio
> because stderr is unbuffered anyway.

The big thing it provides here is a standard point of synchronization
for error messages in multithreaded programs. Otherwise there would be
no lock for different library components to agree on to prevent
interleaved error output.

>  stdio, like a lot of today's standards, is only there because it's
> historical, and interface designers didn't know better at the time.
> It being a widely used and established standard doesn't mean that
> it's a good standard, by far.

Yes and no. There are some things that could have been done better,
and some backwards-compatible additions that could be made to make it
a lot more useful, but I think stdio still largely succeeds in freeing
the programmer from having to spend lots of effort on IO code, for a
large class of useful programs (certainly not all, though!).

> >So this doesn't sound like much
> >of a win over just doing the current multiple-write() approach.
> 
>  Since it mostly happens in the interactive case, avoiding multiple
> writes is essentially an artistic consideration. I was just interested
> in learning why you hadn't suggested manual buffering.

I agree that it's essentially an artistic consideration.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.