Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aeTfPpPXFJWqju1w@pie>
Date: Sun, 19 Apr 2026 13:57:18 +0000
From: Yao Zi <me@...ao.cc>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] dns: Avoid division-by-zero when zero attempts is
 specified in resolv.conf

On Sat, Apr 18, 2026 at 09:02:53PM -0400, Rich Felker wrote:
> On Sat, Apr 18, 2026 at 10:28:13PM +0000, Yao Zi wrote:
> > DNS query retry interval is calculated through timeout / attempts in
> > __res_msend_rc(), both are loaded from resolv.conf in
> > __get_resolv_conf(), while value of attempts isn't checked. This would
> > trigger an undefined behavior if attempts is set to zero in
> > configuration, causing misfunction or termination with SIGFPE.
> > 
> > Gracefully handle it by returning early, with errno set to ECONNREFUSED
> > to make res_send(), which invokes __res_msend_rc() and inherits its
> > errno, fail with the same errno as glibc in this case.
> 
> I don't think errno is meaningful after any of the res_* functions. At
> least it's not documented as being meaningful. According to the docs,
> any failure reason for this functions is shown in h_errno.

Yes, I had a brief look among Linux and FreeBSD's manpages, and none of
them mentions errno.

> I thought the glibc behavior you observed is probably just a stray
> errno (it's always valid for libc functions to clobber it unless
> documented otherwise as long as they don't set it to zero), but it
> looks like it is intentional. I'm not sure why they're doing that.
> Moreover, the comment where it seems to be set indicates it's also
> supposed to cover the "no nameservers configured" case, but it looks
> like they already covered that earlier with ESRCH.
> 
> Since we don't set errno elsewhere and since it's not documented, I
> don't think there's any good reason to do it here. We generally do not
> try to match undocumented glibc behaviors.

This makes sense, I'll wait for some time and change it in v2.

> > I have to admit this bug is found by auditing musl source code with
> > Large Language Model's help. However, LLM is only used to seek for
> > problems, and I've confirmed the issue by hand and reproduced it. This
> > fix, and the commit message are all written by myself.
> > 
> > I've viewed AI policies[1] proposed by Rich before auditing. Besides a
> > little worried about authorship problems since I did view AI-generated
> > analysis and an initial reproducer, I'd prefer to say this doesn't
> > violate it. It would be good if we could agree on a clear policy for
> > such AI usage for code auditing.
> 
> Thank you so much for your disclosure. I don't see any problem with
> what you've done here.
> 
> If you were making complex changes to code where the patch is more
> elaborate, there is a question of whether that's derivative of any
> LLM-suggested code you might have read. But this is a very simple
> change. In the future please just disclose as you've done here. If
> it's questionable we can always have someone rewrite a patch based on
> an understanding of the bug you found.

Really thanks for the confirmation.

> > Back to the technical part, this problem could be easily reproduced by
> > adding
> > 
> > 	options attempts:0
> > 
> > to resolv.conf (do not close the editor since many stuff might
> > segfault!), then basically every program recursively invoking
> > __res_msend_rc() through either gethostbyname*() or res_send() would
> > crash on x86_64 with SIGFPE.
> > 
> > It does sound weird to make zero attempt on DNS requests, but glibc
> > accepts so, and fails with errno = ECONNREFUSED, which seems an
> > unintended behavior. I chose to follow it in this patch, while
> > considering errno = EINVAL to be a better outcome.
> > 
> > It's worth noting that before the commit introduced this bug, musl
> > silently clamps attempts to be at least one, which makes sense but
> > introducing such an implicit behavior might be unexpected.
> 
> I think for now we can just return -1 without doing anything else, or
> restore the historical behavior of clamping it at 1. I think allowing
> 0 to intentionally fail all queries is probably a better behavior.

Okay. I prefer to return -1 here, too.

> This report has also raised the issue that we are not setting h_errno
> like we should for the res_* interfaces. res_query has logic to
> produce a value for h_errno, but the rest of them leave it untouched.
> I don't think it makes sense to mix fixing that with fixing the
> attempts:0 crash. We can figure out what to do about that separately.
> 
> Rich

Best regards,
Yao Zi

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.