Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Sat, 8 Oct 2022 09:07:10 +0200
From: Markus Wichmann <nullplan@....net>
To: musl@...ts.openwall.com
Cc: Uwe Kleine-König <uwe+openwrt@...ine-koenig.org>
Subject: Re: nslookup failures with coarse CLOCK_MONOTONIC

On Sat, Oct 08, 2022 at 02:37:30AM +0200, Uwe Kleine-König wrote:
> On 10/8/22 01:25, Rich Felker wrote:
> > On Sat, Oct 08, 2022 at 01:04:25AM +0200, Uwe Kleine-König wrote:
> > > To improve the situation I suggest something like:
> > >
> > > diff --git a/src/network/res_mkquery.c b/src/network/res_mkquery.c
> > > index 614bf7864b48..78b3095fe959 100644
> > > --- a/src/network/res_mkquery.c
> > > +++ b/src/network/res_mkquery.c
> > > @@ -11,6 +11,7 @@ int __res_mkquery(int op, const char *dname, int
> > > class, int type,
> > >          struct timespec ts;
> > >          size_t l = strnlen(dname, 255);
> > >          int n;
> > > +       static unsigned int querycnt;
> > >
> > >          if (l && dname[l-1]=='.') l--;
> > >          if (l && dname[l-1]=='.') return -1;
> > > @@ -34,6 +35,8 @@ int __res_mkquery(int op, const char *dname, int
> > > class, int type,
> > >
> > >          /* Make a reasonably unpredictable id */
> > >          clock_gettime(CLOCK_REALTIME, &ts);
> > > +       /* force a different ID if mkquery was called twice during
> > > the same tick */
> > > +       ts.tv_nsec += querycnt++;
> > >          id = ts.tv_nsec + ts.tv_nsec/65536UL & 0xffff;
> > >          q[0] = id/256;
> > >          q[1] = id;
> > >
> >
> > This isn't acceptable as-is because it introduces a data race.
>
> Huh. You mean if there is a race the pseudo random IDs are changed in a
> platform dependent way? Doesn't sound sooo bad to me, but probably I'm too
> naive here. Also res_mkquery is documented to be not thread-safe. I didn't
> check deeply, but I guess the counter should be moved to struct __res_state.
> (Assuming the manpage for res_mkquery also applies to musl.)

A data race and a race condition are two different things. A data race
is when the same memory location is accessed from different threads
unsynchronized at the same time, and at least one of those is writing,
and at least one of those is not atomic. Data races are undefined
behavior in C (and as such must be avoided).

Race conditions on the other hand are logic errors that occur when the
code is failing to take into account changes to globally visible state
made concurrently by other threads. If multiple threads increment the
same variable at the same time, you have a data race and a race
condition. If the threads are changed to use an atomic load and an
atomic store instead, then the data race is alleviated, but the race
condition remains.

Notably, if you have one thread spinning in a loop until a global flag
is set, and another thread setting that flag, you have a data race and
no race condition.

__res_mkquery() is used in __lookup_name(), which is used in
getaddrinfo(), which notably *is* defined as thread-safe, so the concern
stands. It could be remedied by simply replacing "querycnt++" with
"a_inc(&querycnt)". However, the patch is still not desirable for
reasons set out by the others.

>
> Despite your concerns, I updated the libc on my RE200 with the above patch
> now, and I cannot observe the failure any more.
>

Well, yes, it works around the bug in the original application. However,
nothing (in theory) stops an application from generating thousands of
queries at the same time and sending them simultaneously, and you are
bound to have at least a few ID collisions in there.

> Of course updating nslookup to use a better API is a nicer solution.
>

The API is fine, it just needs to be used correctly. Simple solution in
this case would be to always overwrite the ID of the second query with
one more than the ID of the original query, or its bitwise inverse or
something. Something that is always different. res_mkquery() cannot know
what other queries are outstanding at the time it creates the ID, only
the application can do that. See name_from_dns() for an example. We do
create multiple queries to the same server (potentially) and have to
avoid ID collision manually.

Ciao,
Markus

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.