Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 24 Jun 2022 10:59:36 -0400
From: Rich Felker <dalias@...c.org>
To: Markus Geiger <markus.geiger@...lsen.com>
Cc: musl@...ts.openwall.com
Subject: Re: [BUG] Non-FQDN domain resolving failure on musl-1.2.x

On Fri, Jun 24, 2022 at 12:28:24PM +0200, Markus Geiger wrote:
> Hej!
> 
> First, I love MUSL (and alpine linux). Great project!
> 
> We encountered a bug in our CI pipeline using alpine images in conjunction
> with AWS DNS servers - and it seems to be related to MUSL:
> 
> $ curl -fsSL https://slack.com
> curl: (6) Could not resolve host: slack.com
> 
> Usually that should return some HTML. It seems to affect only non-FQDN
> domains. As a workaround we use now full FQDN api.slack.com. But there is a
> bug in resolvement! It seems if an AAAA domain is queried over an IPV4
> IP/DNS and doesn’t not return a record the overall resolvement of the
> domain fails.

That's not non-FQDN. Non-FQDN would be "api" as short for
api.slack.com. slack.com is just the apex of a zone, but there's
nothing special about that for resolving; it's likely just a
difference in the records for it vs api, or something fishy the
recursive nameserver you're using is doing...

> *DEBUG LOG*
> 
> We try several alpine images and musl libs on an EC2 host with docker and
> AWS DNS exclusivly:
> 
>    -
> 
>    alpine 3.12 with musl-1.1.24-r10 is last known to work
>    -
> 
>    alpine 3.13 with musl-1.2.2-r1 starts failing (something introduced in
>    musl-1.2 ?)
>    -
> 
>    current alpine 3.16 with current musl-1.2.3-r0 still fails
> 
> alpine 3.12 with musl-1.1.24-r10 is last known to work (see string
> “success”)
> 
> docker run -it --rm --dns=10.204.109.209 alpine:3.12 ash -c 'apk add
> curl bind-tools;set -x;curl -fsSL https://slack.com 1>/dev/null &&
> echo success;host -4 -AAAA slack.com;apk list | grep musl'       ✓
> ns-watch-attribution-nonprod 12:13
> fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
> fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
> (1/21) Installing fstrm (0.6.0-r1)
> (2/21) Installing krb5-conf (1.0-r2)
> (3/21) Installing libcom_err (1.45.6-r0)
> (4/21) Installing keyutils-libs (1.6.1-r1)
> (5/21) Installing libverto (0.3.1-r1)
> (6/21) Installing krb5-libs (1.18.5-r0)
> (7/21) Installing json-c (0.14-r1)
> (8/21) Installing libgcc (9.3.0-r2)
> (9/21) Installing libstdc++ (9.3.0-r2)
> (10/21) Installing libprotobuf (3.12.2-r0)
> (11/21) Installing libprotoc (3.12.2-r0)
> (12/21) Installing protobuf-c (1.3.3-r1)
> (13/21) Installing libuv (1.38.1-r0)
> (14/21) Installing xz-libs (5.2.5-r1)
> (15/21) Installing libxml2 (2.9.14-r0)
> (16/21) Installing bind-libs (9.16.27-r1)
> (17/21) Installing bind-tools (9.16.27-r1)
> (18/21) Installing ca-certificates (20211220-r0)
> (19/21) Installing nghttp2-libs (1.41.0-r0)
> (20/21) Installing libcurl (7.79.1-r1)
> (21/21) Installing curl (7.79.1-r1)
> Executing busybox-1.31.1-r22.trigger
> Executing ca-certificates-20211220-r0.trigger
> OK: 20 MiB in 35 packages
> + curl -fsSL https://slack.com
> + echo success
> success
> + host -4 -AAAA slack.com
             ^^^^

This does not request AAAA. It (-A repeated redundantly 4 times)
request ANY, which is deprecated. So the output is not terribly
helpful in figuring out what's going on.

Can you provide tcpdump of port 53 traffic when curl makes the query,
and/or full strace of the curl execution? This would show what wrong
responses the nameserver is giving that's causing curl to fail.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.