Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 20 Feb 2013 22:18:27 -0800
From: Isaac Dunham <idunham@...abit.com>
To: musl@...ts.openwall.com
Subject: Re: strcasestr.c

On Wed, 20 Feb 2013 20:03:28 -0500
Rich Felker <dalias@...ifal.cx> wrote:

> 
> Yes, it seems to have been an early mistake I made getting wget to
> work. Unfortunately, I think it would be bad policy to remove it now
> since that would break existing dynamic binaries using it, but one
> could make an argument that breaking them is "right" since they were
> already broken (not behaving as intended)...
> 
> > >Since strcasestr is nonstandard and not clearly specified,
> > 
> > it's so non-standard that even nobody uses it.
> > i looked up the usage of the function in codesearch.debian.net, and
> > the only *user* (from all ~20K debian packages) of the function is
> > gnu wget.
> 
> Are you sure this search was correct? IIRC there were more...

A quick check here indicates that busybox, mutt, git, midnight commander, sylpheed, foomatic-rip, elinks, and a couple libraries use it.
Busycox uses it in grep and for checking passwords (see libbb/obscure.c).
 
> I think leaving it as-is is the worst case. It's impossible to detect
> without runtime checks that it's incorrect, so it might encourage
> configure scripts to add runtime checks for broken strcasestr that
> break cross compiling. Or it might lead to programs just assuming it's
> correct, then breaking.
> 
> > in any case it doesnt make sense to put much work and especially
> > much code into it.
> > if it's gonna be implemented "correctly" at all, it should be as
> > slim as possible, in the order of 3-5 LOC.
> 
> As much as I appreciate Todd's interest in contributing a 2way-based
> version, I tend to agree. Not only would adopting the 2way code be a
> fairly large code addition for a never-used feature, but it would also
> bind us to the choice to do ASCII-only case mapping or drop
> performance drastically in the future if we want to change that
> decision.
> 
> My leaning right now would be to write the naive strstr loop using
> strcasecmp instead of strcmp (or an inline loop) for the inner loop.
> This will cause strcasestr to have the exact same case-folding
> semantics as strcasecmp, whatever those are in the future. This is
> best for consistency. Unfortunately, it's very bad from a performance
> standpoint, but I don't know of any code using this function for
> high-performance use.

Busybox implements their own version using this approach.
 
> The other somewhat reasonable option would be removing the function,
> which would expose breakage in programs that were already using the
> broken version in musl. I'm mildly against this, but I'd be interested
> in hearing arguments either way.

Were the claimed frequency correct, I would want it gone. As it stands, I think that a small but slow version is justifiable. A large one isn't.
-- 
Isaac Dunham <idunham@...abit.com>

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.