Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 18 Jul 2022 22:57:24 -0400
From: Rich Felker <dalias@...c.org>
To: "Nieminen, Jussi" <Jussi.Nieminen@...atrace.com>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>
Subject: Re: Bug in getaddrinfo causing spurious returns with wrong
 error values

On Tue, Nov 23, 2021 at 10:05:37AM -0500, Rich Felker wrote:
> On Tue, Nov 23, 2021 at 02:47:49PM +0000, Nieminen, Jussi wrote:
> > Hi,
> > 
> > I'm a developer from the performance monitoring company Dynatrace, and I've been
> > recently investigating curious problems at our customers' environments where a
> > call to musl's getaddrinfo appears to spuriously return ENOENT when called from
> > a node.js application that is being monitored with the Dynatrace agent.
> > 
> > I managed to pinpoint the problem to the code that performs the AI_ADDRCONFIG
> > check. If an address family that is not enabled on the host is specified, a call
> > to "connect" in that code fails, the socket fd is closed, and the value of
> > "errno" is then evaluated.
> > 
> > The problem is that the call to "close" can change the value of errno, which
> > will break the switch-case that follows it. Especially if aio is used (which is
> > the case when the Dynatrace agent is included in the application), the call to
> > close will end up setting errno to ENOENT by default (even without a failure)
> > within the "aio_cancel" function if an aio operation is active. In such a case
> > getaddrinfo will then incorrectly return EAI_SYSTEM with errno set to ENOENT.
> > 
> > (After some error code translations within libuv, node.js will then print an
> > error message claiming that getaddrinfo failed with ENOENT which is rather
> > confusing.)
> 
> Indeed, this all makes sense.
> 
> > Even if aio is not used, the code might fail whenever "close" gets interrupted
> > and returns with errno set to EINTR. As the return value of close is not
> > checked, the errno might thus "silently" change before getting evaluated with
> > the assumption that it still contains the value set when "connect" failed.
> 
> close can't return with EINTR but can return with EINPROGRESS which
> would give the same result here, I think.

I was mistaken about this. EINPROGRESS was removed back in commit
6bea5dc69892cd9ff0c222474e7dd468c29dfa75 (2015) and is no longer
possible. However, as the original report stated, ENOENT comes from
__aio_close where errno is clobbered even on success.

On looking at this again, I'm thinking we might should rework the
stuff causing that to happen, but the patch submitted still looks
reasonable since close does not have any contract not to clobber
errno.

> > Below is a simple patch that should take care of this problem. Let me know if I
> > can provide any more information or if there is anything else I can help with.
> 
> I think this patch is probably okay. I wondered if close was in the
> set of functions POSIX-future intends to require not to touch errno on
> success, but it doesn't seem to be, and moreover the EINPROGRESS
> semantics would undermine that anyway. So saving errno before calling
> close is probably the right thing to do here.
> 
> > Thanks,
> > Jussi
> 
> Thanks for the clear analysis and patch!

If nobody objects/sees anything wrong, I'll go ahead and take this
now. Sorry for the long delay!

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.