Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 18 Apr 2020 20:03:47 -0400
From: Rich Felker <>
To: Florian Weimer <>
Subject: Re: TCP support in the stub resolver (was: Re: Outgoing DANE
 not working)

On Sat, Apr 18, 2020 at 07:14:24PM +0200, Florian Weimer wrote:
> * Rich Felker:
> > On Fri, Apr 17, 2020 at 11:22:34AM +0200, Florian Weimer wrote:
> >> >> > However it's not clear how "fallback to tcp" logic should interact
> >> >> > with such concurrent requests -- switch to tcp for everything and
> >> >> > just one nameserver as soon as we get any TC response?
> >> >> 
> >> >> It's TCP for this query only, not all subsequent queries.  It makes
> >> >> sense to query the name server that provided the TC response: It
> >> >> reduces latency because that server is more likely to have the large
> >> >> response in its cache.
> >> >
> >> > I'm not talking about future queries but other unfinished queries that
> >> > are part of the same operation (presently just concurrent A and AAAA
> >> > lookups).
> >> 
> >> If the second response has TC set (but not the first), you can keep
> >> the first response.  Re-querying both over TCP increases the
> >> likelihood that you get a response from the same cluster node (so more
> >> consistency), but you won't get that over UDP, ever, so I don't think
> >> it matters.
> >> 
> >> If the first response has TC set, you have an open TCP connection you
> >> could use for the second query as well.  Pipelining of DNS requests
> >> has compatibility issues because there is no application-layer
> >> connection teardown (an equivalent to HTTP's Connection: close).  If
> >> the server closes the connection after sending the response to the
> >> first query, without reading the second, this is a TCP data loss
> >> event, which results in an RST segment and potentially, loss of the
> >> response to the first query.  Ideally, a client would wait for the
> >> second UDP response and the TCP response to arrive.  If the second UDP
> >> response is TC as well, the TCP query should be delayed until the
> >> first TCP response came back.
> > Indeed it sounds like one TCP connection would be needed per request,
> > so switchover would just be per-request if done.
> No, you can reuse the connection for the second query (in most cases).
> However, for maximum robustness, you should not send the second query
> until the first response has arrived (no pipelining).  You may still
> need a new connection for the second query if the TCP stream ends
> without a response, though.

That's why you need one per request -- so you can make them
concurrently (can't assume pipelining).

> > My leaning is probably not to do fallback at all (complex logic,
> > potential for unexpected slowness, not needed by vast majority of
> > users) and just add TCP support with option use-vc for users who
> > really want complete replies. All of this would be contingent anyway
> > on making internal mechanisms able to handle variable result size
> > rather than fixed-size 512 bytes so it's not happening right away.
> > Doing it carelessly would create possibly dangerous bugs.
> I still think it's wrong.  The protocol says that you must perform TCP
> fallback.  If you don't, it's rather confusing for the libresolv
> interfaces.

There's a clause I'd have to look up again, but that explicitly says
(roughly, I'm paraphrasing this from memory) you have the option not
to in settings where it wouldn't be appropriate to do so or where
you're happy with the truncated responses. The reason my leaning is to
make it require explicit configuration to use TCP is that the vast
majority of musl users seem happy with what it's doing now, which
*was* intentional; it'd be nice not to change that without explicit
user intent to do so. Also, making TCP available only in TCP-only
(use-vc) mode would perform badly with remote nameservers, which would
strongly encourage users who want large responses (which are almost
certainly things that do need DNSSEC validation) to setup a proper
local validating nameserver.

Of course all of this has prerequisite core changes that'd need to be
made before it could be done, so nothing's going to happen either way
in the short term.

> > I'm still also somewhat of the opinion that users who want a resolver
> > library (res_* API) with lots of features should just link BIND's, but
> > it would be nice not to have to do that.
> You could drop the res_* interfaces from musl.  They are mostly needed
> for non-address queries, and those are the ones that tend to be larger
> than 512 bytes.

They're sufficient for pretty much everything that actually matters,
and very convenient to have. Removing them seems like it has no
advantages. If someone *really* wants more functionality they can link
BIND's libresolv, or we can evaluate adding the functionality they're

> Then it might be possible that no one will notice the missing TCP
> fallback.

Really almost no one has noticed it so far, and the places where it
was noticed were buggy (IIRC Google or Cloudflare) nameservers that
were sending an empty response on truncation rather than a properly
truncated response, which seems to have since been fixed. (And in this
case the fallback would have been a major performance hit, so it was
nice that it was caught and fixed instead).


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.