|
|
Message-ID: <20220720015457.GC7074@brightrain.aerifal.cx>
Date: Tue, 19 Jul 2022 21:54:59 -0400
From: Rich Felker <dalias@...c.org>
To: "Nieminen, Jussi" <Jussi.Nieminen@...atrace.com>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>
Subject: Re: Bug in getaddrinfo causing spurious returns with wrong
error values
On Tue, Nov 23, 2021 at 02:47:49PM +0000, Nieminen, Jussi wrote:
> Hi,
>
> I'm a developer from the performance monitoring company Dynatrace, and I've been
> recently investigating curious problems at our customers' environments where a
> call to musl's getaddrinfo appears to spuriously return ENOENT when called from
> a node.js application that is being monitored with the Dynatrace agent.
>
> I managed to pinpoint the problem to the code that performs the AI_ADDRCONFIG
> check. If an address family that is not enabled on the host is specified, a call
> to "connect" in that code fails, the socket fd is closed, and the value of
> "errno" is then evaluated.
>
> The problem is that the call to "close" can change the value of errno, which
> will break the switch-case that follows it. Especially if aio is used (which is
> the case when the Dynatrace agent is included in the application), the call to
> close will end up setting errno to ENOENT by default (even without a failure)
> within the "aio_cancel" function if an aio operation is active. In such a case
> getaddrinfo will then incorrectly return EAI_SYSTEM with errno set to ENOENT.
>
> (After some error code translations within libuv, node.js will then print an
> error message claiming that getaddrinfo failed with ENOENT which is rather
> confusing.)
>
> Even if aio is not used, the code might fail whenever "close" gets interrupted
> and returns with errno set to EINTR. As the return value of close is not
> checked, the errno might thus "silently" change before getting evaluated with
> the assumption that it still contains the value set when "connect" failed.
>
> Below is a simple patch that should take care of this problem. Let me know if I
> can provide any more information or if there is anything else I can help with.
>
> Thanks,
> Jussi
>
>
> -------------------------------------------------------------------------------
> diff --git a/src/network/getaddrinfo.c b/src/network/getaddrinfo.c
> index efaab306..71809856 100644
> --- a/src/network/getaddrinfo.c
> +++ b/src/network/getaddrinfo.c
> @@ -16,6 +16,7 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
> char canon[256], *outcanon;
> int nservs, naddrs, nais, canon_len, i, j, k;
> int family = AF_UNSPEC, flags = 0, proto = 0, socktype = 0;
> + int saved_errno = 0;
> struct aibuf *out;
>
> if (!host && !serv) return EAI_NONAME;
> @@ -66,11 +67,14 @@ int getaddrinfo(const char *restrict host, const char *restrict serv, const stru
> pthread_setcancelstate(
> PTHREAD_CANCEL_DISABLE, &cs);
> int r = connect(s, ta[i], tl[i]);
> + /* The call to "close" might change errno, especially if aio is in use;
> + * save the value set by "connect" for the later comparison. */
> + if (r < 0) saved_errno = errno;
> pthread_setcancelstate(cs, 0);
> close(s);
> if (!r) continue;
> }
> - switch (errno) {
> + switch (saved_errno) {
> case EADDRNOTAVAIL:
> case EAFNOSUPPORT:
> case EHOSTUNREACH:
> -------------------------------------------------------------------------------
A couple minor problems with the patch:
- The errno from socket() is not used if the failure was from
socket(). I'm not sure yet if that matters but I think it may if
IPv6 was disabled in a way that makes socket() fail.
- In the case where EAI_SYSTEM is returned, the error was not restored
back into errno, so the caller cannot get the cause of error if it
was clobbered by close.
I'll work on a fixed version. I think the right thing to do is just
save/restore errno itself rather than switching on saved_errno.
Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.