Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 8 Oct 2022 02:37:30 +0200
From: Uwe Kleine-König <uwe+openwrt@...ine-koenig.org>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: nslookup failures with coarse CLOCK_MONOTONIC

Hello Rich,

On 10/8/22 01:25, Rich Felker wrote:
> On Sat, Oct 08, 2022 at 01:04:25AM +0200, Uwe Kleine-König wrote:
>>
>> Musl does the following to create the 16 bit ID:
>>
>>           /* Make a reasonably unpredictable id */
>>           clock_gettime(CLOCK_REALTIME, &ts);
>>           id = ts.tv_nsec + ts.tv_nsec/65536UL & 0xffff;
>>           q[0] = id/256;
>>           q[1] = id;
>>
>> (from musl's src/network/res_mkquery.c) My hypothesis now is that
>> the monotonic clock has a resolution of 20 µs only. So if the two
>> res_mkquery() calls are called within the same 20 µs tick, the IDs
>> end up being identical. If they happen in two consecutive ticks, the
>> IDs have a delta of 20000 or 20001 which matches the four cases
>> observed above.
>>
>> To improve the situation I suggest something like:
>>
>> diff --git a/src/network/res_mkquery.c b/src/network/res_mkquery.c
>> index 614bf7864b48..78b3095fe959 100644
>> --- a/src/network/res_mkquery.c
>> +++ b/src/network/res_mkquery.c
>> @@ -11,6 +11,7 @@ int __res_mkquery(int op, const char *dname, int
>> class, int type,
>>          struct timespec ts;
>>          size_t l = strnlen(dname, 255);
>>          int n;
>> +       static unsigned int querycnt;
>>
>>          if (l && dname[l-1]=='.') l--;
>>          if (l && dname[l-1]=='.') return -1;
>> @@ -34,6 +35,8 @@ int __res_mkquery(int op, const char *dname, int
>> class, int type,
>>
>>          /* Make a reasonably unpredictable id */
>>          clock_gettime(CLOCK_REALTIME, &ts);
>> +       /* force a different ID if mkquery was called twice during
>> the same tick */
>> +       ts.tv_nsec += querycnt++;
>>          id = ts.tv_nsec + ts.tv_nsec/65536UL & 0xffff;
>>          q[0] = id/256;
>>          q[1] = id;
>>
>> Would that make sense?
> 
> This isn't acceptable as-is because it introduces a data race.

Huh. You mean if there is a race the pseudo random IDs are changed in a 
platform dependent way? Doesn't sound sooo bad to me, but probably I'm 
too naive here. Also res_mkquery is documented to be not thread-safe. I 
didn't check deeply, but I guess the counter should be moved to struct 
__res_state. (Assuming the manpage for res_mkquery also applies to musl.)

Despite your concerns, I updated the libc on my RE200 with the above 
patch now, and I cannot observe the failure any more.

Of course updating nslookup to use a better API is a nicer solution.

> Which implementation of nslookup is this? Busybox? It would probably
> be useful to hear thoughts on it from their side.

Jo already replied here, nothing to add from my side.

BTW, glibc also generates the IDs from the monotonic clock only. It 
makes a bit bigger effort to scramble the bits in there, but if the time 
doesn't advance, it also generates identical IDs.

Best regards and thanks for your quick response
Uwe

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.