Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 29 Aug 2017 09:07:57 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: getopt() not exposing __optpos - shell needs it

On Tue, Aug 29, 2017 at 02:47:13PM +0200, Denys Vlasenko wrote:
> On Tue, Aug 29, 2017 at 2:20 PM, Rich Felker <dalias@...c.org> wrote:
> >> >> When I try to do that (use getopt() to implement "getopts"), it hits a snag.
> >> >> Unlike normal getopt() usage in C programs, where it is called in a loop
> >> >> with the same argv[] array until parsing is finished,
> >> >> when it is used from "getopts", each successive call will (usually) have
> >> >> the same argv[] CONTENTS, but not the ADDRESSES.
> >> >> (The reason is in how shell works: it re-creates command arguments just before
> >> >> running a command, since there can be variable substitution, globbing, etc).
> >> >
> >> > First, some background out of the spec to establish what is supposed
> >> > to work and what's not:
> >> >
> >> >     If the application sets OPTIND to the value 1, a new set of
> >> >     parameters can be used: either the current positional parameters
> >> >     or new arg values. Any other attempt to invoke getopts multiple
> >> >     times in a single shell execution environment with parameters
> >> >     (positional parameters or arg operands) that are not the same in
> >> >     all invocations, or with an OPTIND value modified to be a value
> >> >     other than 1, produces unspecified results.
> >> >
> >> > What this means is that, when you use getopts(1), you need to either
> >> > use the exact same arguments (as you said, *string contents*, not
> >> > likely to be the same argv[] pointers) or reset it with OPTIND=1.
> >> >
> >> > It seems to me that the easiest, fully-portable fix is just the
> >> > obvious quadratic-time solution: on each run of getopts(1), reset
> >> > getopt(3) to the start and call it ++N times.
> >>
> >> This has several problems:
> >> It prints multiple messages "invalid option -q"
> >> when there are options which are not in optstring.
> >
> > opterr=0;
> >
> > Either leave it 0 and always do your own error printing, or set it
> > nonzero just before the last call (for the current option) so that
> > only that one prints an error.
> >
> >> It mangles optarg if an option without argument follows
> >> an option with an argument.
> >
> > Maybe I'm missing what you're trying to say, but all the state is
> > clobbered; I don't see how optarg is a problem specifically. You can
> > clear or set it to a sentinel value before the relevant call if you're
> > trying to determine if the call set it. Across other calls (not the
> > one for the current option) I don't see why it matters at all what
> > happens to it.
> 
> Yes, this can be done.
> 
> It gets increasigly ugly, though.
> 
> With these amounts of massaging around libc API design breakage,

Yes the getopt API is horribly broken. It's all global state, with a
tiny portion of that state internal/inaccessible. It doesn't follow
that the solution is adding new extensions every time an application
hits an obstacle from the brokenness. The right direction for fixing
it on the libc side would be introduction (with consensus across
important implementations) of a getopt_r API or similar with no
global/internal state.

> "getopts" builtin code in hush is almost as big as simply
> reimplementing getopt(): ~500 versus ~750 bytes on x86.
> If I factor out ash getopts implementation and use it in both shells,
> I can probably even decrease code size.

I really don't think adding a single store to optarg is going to have
relevant effects on the size, so we're back to just talking about the
cost of the obvious quadratic solutions I discussed above. Yes it may
turn out that just implementing getopts(1) without getopt(3) can be
smaller -- seems very likely if you can drop getopt(3) entirely from
the link, but less so if other code in busybox uses getopt(3) anyway
or if it's dynamic-linked and thus not included in the binary anyway.
I don't know whether this makes sense for you; it's your call. One
hidden cost is that getopt does have a number of nasty corner cases
that have already been considered (or found as bugs and fixed) in
musl, but I don't know if any of them carry over to corner cases in
getopts(1); if not they're probably not relevant to you.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.