musl - Re: possible getopt stderr output changes

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5489BADF.8040604@skarnet.org>
Date: Thu, 11 Dec 2014 16:40:15 +0100
From: Laurent Bercot <ska-dietlibc@...rnet.org>
To: musl@...ts.openwall.com
Subject: Re: possible getopt stderr output changes

On 11/12/2014 07:44, Rich Felker wrote:
> Is there a reason behind this? On my build, the printf core is ~6.5k
> and the other parts of stdio you might be likely to pull in are under
> 2k. I'm happy to take your opinion into consideration but it would be
> nice to have some rationale.

  6.5k, or even 8.5k, is not much in the grand scale of things, but it's
about the ratio of useful pulled in code / total pulled in code, which
I like to be as close as possible to 1. And stdio tanks that ratio,
see below. The modest size of the printf code is a testimony to the
efficiency of the musl implementation, not to the sanity of the
interface.


> Personally I find stdio a lot more reasonable than getopt.

  I dislike stdio for several reasons:

  - The formatting engine is certainly convenient, but it is basically
a runtime interpreter, which has to be entirely pulled in as soon as
there's a format string, no matter how simple the formatting is.
(Unless compilers perform specific static analysis on format strings
to know which part of the interpreter they have to pull, but I doubt
this is the case; gcc magically replaces printf(x) with puts(x) when
x is devoid of format operations, and it is ugly enough as is.)
That means I have to pull in the formatting code for floating point
numbers, even if I only handle integers and strings; I have to pull in
the code for edge cases of the specification, including the infamous
"%n$" format, even if I never need it; I have to pull in varargs even
if I only do very regular things with a fixed number of arguments.
Most of the time I just want to print a string, a character, or an
integer: being able to do this shouldn't add more than 2k to my
executable, at most.

  - The FILE interface is not by any mesure suited to reliable I/O.
  When printf fails, there's no way to know how many bytes have been
written to the descriptor. Same with fclose: if it fails, and the
buffer was not empty, there's no way to know if everything was written.
Having the same structure for buffered (stdout) and unbuffered (stderr)
output is unnecessarily confusing; and don't get me started on buffered
input, the details of which users have exactly zero control over. FILE
is totally unusable for asynchronous I/O, which is 99% of what I do;
it's just good enough to write error messages to stderr, where you don't
need accurate reporting - in which case you can even do without stdio
because stderr is unbuffered anyway.

  stdio, like a lot of today's standards, is only there because it's
historical, and interface designers didn't know better at the time.
It being a widely used and established standard doesn't mean that
it's a good standard, by far.


> [getopt]
> has ugly global state, including possibly hidden internal state with
> no standard way to reset it. It works well enough for most things
> (because you can pretend the global state is a sort of main-local
> state), but it's a problem if you want to handle multiple virtual
> command lines in the same process

  I agree, it's ugly; but global state is a known problem and it's
easy to fix. It's already been fixed for pwd/grp/netdb, for localtime,
and a lot of other interfaces; it's only a matter of time before some
kind of getopt_r() is standardized.


> For proper reporting of errors with long options (note: currently this
> is not done right), at least one component of the message, the option
> name, has unbounded size, so there's no simple way to generate the
> whole message in a buffer.

  Ah, long options. I have no idea how feasible it is to keep getopt and
getopt_long as separated as possible, but I wouldn't mind at all if
getopt_long (but not getopt) relied on stdio. Because programs using
getopt_long are likely to already be using stdio anyway, and this is
probably GNU so no one cares about code size. :)


> So this doesn't sound like much
> of a win over just doing the current multiple-write() approach.

  Since it mostly happens in the interactive case, avoiding multiple
writes is essentially an artistic consideration. I was just interested
in learning why you hadn't suggested manual buffering.

-- 
  Laurent
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.