Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 24 Jun 2022 21:56:56 -0400
From: Rich Felker <dalias@...c.org>
To: Markus Geiger <markus.geiger@...lsen.com>
Cc: musl@...ts.openwall.com
Subject: Re: [BUG] Non-FQDN domain resolving failure on musl-1.2.x

On Fri, Jun 24, 2022 at 07:14:10PM +0200, Markus Geiger wrote:
> Sorry: not Amazon DNS – 10.204.109.209 is a BIND server in our network
> we've setup to work with our global VPN/DNS.
> 
> BUT the strange thing is that the domain lookup works with musl-1.1.24
> while with some musl-1.2.x just quits with an error.
> 
> a comparison with the docker runs and `sudo tcpdump -v -i docker0 udp port
> 53 or tcp port 53` did not bring up any diffs except the list of A records
> returned is in a different order (which i think is completely normal). the
> order of requests is the same
> 
> tcpdump from working version:
> >   bind-us-east-1a.XXXXXXXXXXXXXX.domain > 172.17.0.3.45501: 18685 9/13/8
> slack.com. A 3.95.117.96, slack.com. A 34.231.24.224, slack.com. A
> 54.163.235.119, slack.com. A 54.147.59.169, slack.com. A 34.193.255.5,
> slack.com. A 34.204.109.226, slack.com. A 34.225.62.185, slack.com. A
> 34.203.97.10, slack.com. A 54.92.199.186 (510)
> 
> tcpdump from non-working version:
> >   bind-us-east-1a.XXXXXXXXXXXXXX.domain > 172.17.0.3.59951: 49211 9/13/8
> slack.com. A 34.225.62.185, slack.com. A 54.163.235.119, slack.com. A
> 34.231.24.224, slack.com. A 54.147.59.169, slack.com. A 34.193.255.5,
> slack.com. A 34.204.109.226, slack.com. A 54.92.199.186, slack.com. A
> 3.95.117.96, slack.com. A 34.203.97.10 (510)
> 
> Complete log:
> 
>     172.17.0.3.59951 > bind-us-east-1a.XXXXXXXXXXXXXXXXXXXXXXXXXx.domain:
> 49211+ A? slack.com. (27)
> 18:56:19.990087 IP (tos 0x0, ttl 64, id 10210, offset 0, flags [DF], proto
> UDP (17), length 55)
>     172.17.0.3.59951 > bind-us-east-1a.XXXXXXXXXXXXXXXXXXXXXXXXXx.domain:
> 49334+ AAAA? slack.com. (27)
> 18:56:20.154990 IP (tos 0x0, ttl 250, id 17825, offset 0, flags [none],
> proto UDP (17), length 538)
>     bind-us-east-1a.XXXXXXXXXXXXXXXXXXXXXXXXXx.domain > 172.17.0.3.59951:
> 49211 9/13/8 slack.com. A 34.225.62.185, slack.com. A 54.163.235.119,
> slack.com. A 34.231.24.224, slack.com. A 54.147.59.169, slack.com. A
> 34.193.255.5, slack.com. A 34.204.109.226, slack.com. A 54.92.199.186,
> slack.com. A 3.95.117.96, slack.com. A 34.203.97.10 (510)
> 18:56:20.241377 IP (tos 0x0, ttl 250, id 17846, offset 0, flags [none],
> proto UDP (17), length 55)
>     bind-us-east-1a.XXXXXXXXXXXXXXXXXXXXXXXXXx.domain > 172.17.0.3.59951:
> 49334 ServFail 0/0/0 (27)
> 18:56:20.241501 IP (tos 0x0, ttl 64, id 10233, offset 0, flags [DF], proto
> UDP (17), length 55)

Here's your problem -- the server is returning ServFail rather than an
answer for some of the queries. This makes musl's resolver continue
retrying for an answer. In an old version, there may have been a bug
whereby, after the retries timed out, the fact that one query failed
was sometimes overlooked. This logic was improved between the versions
you tested as part of ensuring DNSSEC integrity. In any case, you just
need to find the cause of the ServFail (maybe a hack someone put in
place to try to suppress use of IPv6?) and fix it.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.