Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 31 Mar 2018 10:01:57 -0400
From: Rich Felker <dalias@...c.org>
To: Florian Weimer <fw@...eb.enyo.de>
Cc: William Pitcock <nenolod@...eferenced.org>, musl@...ts.openwall.com
Subject: Re: [PATCH] resolver: only exit the search path loop there
 are a positive number of results given

On Sat, Mar 31, 2018 at 12:42:17PM +0200, Florian Weimer wrote:
> * William Pitcock:
> 
> > A local proxy isn't going to be workable, because most people are
> > going to just say "but Debian or Fedora doesn't require this," and
> > then just go use a glibc distribution.
> 
> Some parts of the glibc behavior are clearly wrong and not even
> internally consistent.  Rich is right that for correctness, you can
> only proceed on the search path if you have received a successful
> reply.  However, making changing in this area difficult, both due to
> the current state of the glibc code, and existing deployments
> depending on corner cases which are not well-understood.

The behavior of path search on failures is a separate issue from the
behavior on "NODATA" so we can probably stick to the latter for now.

> I'm not entirely convinced that using different search path domains
> for different address families is necessarily wrong.

It breaks the completely reasonable application expectation that the
results produced by AF_INET and AF_INET6 queries are subsets of the
results produced by AF_UNSPEC. The proper application idiom is to use
AF_UNSPEC (or no hints) and respect the order the results are returned
in, in order to honor RFC 3484/gai.conf or any other means by which
getaddrinfo determines which order results should be tried in. It's
(IMO at least) utterly wrong to try to merge results from different
search domains, but I can see applications trying both queries
separately when they encounter the inconsistency...

> Historically,
> the NODATA/NXDOMAIN signaling has been really inconsistent anyway, and
> I suspect it still is for some users.

Do you have a reference for this? AFAIK it was very consistent in all
historical implementations. It's also documented (in RFC-????...I
forget where but I looked it up during this).

> What Cloudflare is doing appears to be some kind of protection against
> NSEC-based zone enumeration, and that requires synthesizing NODATA
> response.  They are unlikely to change that, and they won't be the
> only ones doing this.

Thanks for the explanation.

> > Kubernetes imposes a default search path with the cluster domain last, so:
> > 
> >   - local.prod.svc.whatever
> >   - prod.svc.whatever
> >   - svc.whatever
> >   - yourdomain.com
> 
> Do you have a source for that?
> 
> Considering that glibc had for a long time a hard limit at six
> entries, I find that approach rather surprising.  This leaves just
> three domains in the end user's context.  That's not going to be
> sufficient for many users.  Anyway …

k8s isn't software you install as a package on your user system. It's
cloud/container stuff, where it wouldn't make sense to add more search
domains beyond the ones for your application.

> > The cloudflare issue is that they send SUCCESS code with 0 replies,
> > which causes musl to error when it hits the yourdomain.com.
> 
> Is the long search path really the problem here?  Isn't it ndots:5?
> It means that queries destined to the general DNS tree hit the
> organizational tree first, where the search stops due to the NODATA
> response.  So you never get the expected response from the public
> tree.
> 
> Is this what's happening?

Yes. ndots>1 is utterly awful -- it greatly increases latency of every
lookup, and has failure modes like what we're seeing now -- but the
k8s folks designed stuff around it. Based on conversations when musl
added search domains, I think there are people on the k8s side that
realize this was a bad design choice and want to fix it, but that
probably won't be easy to roll out to everyone and I have no idea if
it's really going to happen.

FWIW, if ndots<=1 and there is only one search domain, the
NODATA/NXDOMAIN issue does not make any difference to the results
(assuming no TLDs with top-level A/AAAA records :). But if ndots>1 or
there are at least 2 search domains, the result does change. In the
former case, global lookups get broken; in the latter, subsequent
search domains get missed.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.