Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 15 Sep 2015 09:23:44 -0700
From: Andy Shinn <andys@...yshinn.as>
To: musl@...ts.openwall.com
Subject: Re: resolv.conf ordering

Hi Jameel,

Also on the same subject (since you have specifically pointed out Consul)
is a thread I started at http://www.openwall.com/lists/musl/2015/09/04/3
which may interest you. I actually maintain the official Docker Alpine
Linux image at https://github.com/gliderlabs/docker-alpine and there is a
similar thread we are tracking information at
https://github.com/gliderlabs/docker-alpine/issues/8.

I'd also be interested in the conversation that took place on IRC (I'm in
the channel but must have missed it). Rich, are you able to give dates /
times that I might be able to go back in my own IRC client to check out
what was discussed?

-Andy

On Tue, Sep 15, 2015 at 5:56 AM, Jameel Al-Aziz <me@...aziz.net> wrote:

> Thanks for the response!
>
> I would love to know more about the conversation on IRC.
>
> I almost feel like there are valid arguments on both sides. In a
> distributed environment, where machines and services come and go, it's
> pretty difficult to guarantee consistent records both reliably and quickly.
>
> While I was able to semi-solve my problem by enabling recursors through
> Consul DNS, I realized that I have a chicken and egg problem. The caveat
> here is this is particular to docker and some of the decisions they've made.
>
> The basic issue is that I have some containers that need to be run with
> "--net=host" and some that do not. In the "--net=host" containers
> effectively copy over the host's resolv.conf. In order to make sure
> everything can be resolved, I need to guarantee that Consul is setup as
> early as possible. However, in the case that the setup process needs DNS,
> you run into a problem. I could do some clever hackery to use the default
> host DNS and overwrite the host's /etc/resolv.conf after setting up Consul
> DNS, but that's not the greatest solution. This problem can also occur with
> bridged-networking containers if you choose to specify the "dynamic" DNS
> server as a default dns option to the docker daemon.
>
> Put in more simple terms, we need normal DNS resolution while
> bootstrapping, then as services register themselves, we need dependent
> services to be able to look up the newly registered entries. Effectively,
> the consistency is delayed at best.
>
> The other issue here is that having recursion enabled just feels wrong and
> insecure. Sure, this is all behind a VPC, but I like to err on the side of
> caution.
>
> I am probably wrong here, but it seems that the musl logic is only valid
> when all nameservers are consistent. However, with dynamic service
> registration, that consistency comes at the cost of speed.
>
> The behavior we would ideally want is as you mentioned:
> "Assuming no _conflicting_ positive responses, it would need to do
> something like forward positive responses as soon as it has at least
> one positive response from upstream, but only forward negative
> responses once it has a negative response from _all_ upstream sources."
>
> I'm almost certain we can accomplish what we want by having dnsmasq or
> some other dns proxy/cache try Consul DNS first and then fallback upstream
> for non-authoritative domains. The proxy has to be available very early on,
> which is entirely doable in our scenario. However, it does add another
> layer of indirection, which is just another potential failure point.
>
> All that being said, I definitely understand why the decision was made,
> just would be nice to have an option to enable the "robust" logic! :)
>
> On Mon, Sep 14, 2015 at 9:43 PM Rich Felker <dalias@...c.org> wrote:
>
>> On Tue, Sep 15, 2015 at 03:25:20AM +0000, Jameel Al-Aziz wrote:
>> > I'm sure this has been brought up before, but just thought I'd reach out
>> > see if there's a solution.
>> >
>> > I use musl on Alpine via Docker. I encountered issues today where DNS
>> > wasn't resolving the way we expect in our images. I finally managed to
>> > trace it down to musl's resolver (
>> >
>> http://wiki.musl-libc.org/wiki/Functional_differences_from_glibc#Name_Resolver_.2F_DNS
>> > ).
>> >
>> > We configure resolv.conf with three DNS servers: Consul DNS, AWS VPC
>> DNS,
>> > Google DNS. It turns out that the AWS VPC DNS is the fastest to respond
>> and
>> > therefore causes results to fail even though they can be served via
>> Consul
>> > DNS. Putting aside that the musl resolver logic breaks convention (which
>> > many people rely on), it seems that in this case it is more
>> unpredictable
>> > than simply following the order.
>> >
>> > The host DNS is Consul, and while we could just setup Consul with
>> > recursors, we run the risk of failing to resolve anything if Consul
>> fails.
>> > Setting up a local caching DNS is also overkill (we're in Docker
>> > containers).
>> >
>> > Is there no way to force musl to follow the order of nameservers in
>> > resolv.conf? Or even if not, to allow musl to accept the first
>> successful
>> > response instead of failing on the first response? It seems to me that
>> we
>> > have to give up reliability for predictability, which is not what this
>> > feature was intended to do from my understanding.
>> >
>> > Any help on this matter would be greatly appreciated!
>>
>> Someone else raised this question on our IRC channel a week or two
>> ago, and in short, the answer is no. Basically this setup does not
>> make sense, even if you do have a resolver (glibc) that does do
>> ordered fallback:
>>
>> - If you expect to sometimes need the second or third nameserver for
>>   queries the first cannot answer, then you're going to have terrible
>>   performance (multi-second delay before falling back to the second
>>   one).
>>
>> - Unless all the nameservers agree on the records they're serving (in
>>   which case you wouldn't care about order), your query results will
>>   be unstable/inconsistent when the first server fails to respond. The
>>   typical result is that you will wrongly get NxDomain instead of a
>>   failed/timed-out query.
>>
>> The second issue is really the motivation for what musl is doing: musl
>> is assuming that all the nameservers have consistent records, because
>> if they didn't, actual positive/negative results would be affected by
>> transient failures rather than transient failures being reported to
>> the calling program. This is a serious class of robustness (and
>> possibly security, since DoS can translate into false results)
>> failure.
>>
>> If you really need to union inconsistent records from multiple
>> nameservers, the right way to do this is with a dns proxy/cache.
>> Assuming no _conflicting_ positive responses, it would need to do
>> something like forward positive responses as soon as it has at least
>> one positive response from upstream, but only forward negative
>> responses once it has a negative response from _all_ upstream sources.
>> Of course these are the constraints to do it "right"/robustly. If all
>> you want is something that works at least as well as glibc is working
>> for you now, dnsmasq is probably sufficient.
>>
>> The conversation about all this on IRC was actually quite interesting.
>> We have a no-public-logging policy so there are not logs posted
>> anywhere, but if you're interested in more of what was discussed I
>> could try to summarize it or see if the people involved would be ok
>> with sharing a log excerpt.
>>
>> Rich
>>
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.