Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 3 Jun 2022 08:27:04 -0400
From: Rich Felker <dalias@...c.org>
To: Sascha Braun <sascha.braun.lpz@...glemail.com>
Cc: musl@...ts.openwall.com
Subject: Re: Problably Issue in name_from_dns // __res_msend_rc
 (lookup_name.c)

On Fri, Jun 03, 2022 at 12:28:40AM +0200, Sascha Braun wrote:
> Hi Rich,
> 
> I think I narrowed the thing down. Below is a dump of what happens in a
> 'normal' situation and what happened when the sporadic issue appeared.
> 
> Normal:
> begin lookup3...
> __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
> __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
> __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> ecea010000010000000000000377777706676f6f676c6503636f6d0000010001
> __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
> __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
> __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> ecea010000010000000000000377777706676f6f676c6503636f6d00001c0001
> __syscall_recv begin EP[0.0.0.0:0]
> __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53]
> ecea818000010001000000000377777706676f6f676c6503636f6d0000010001c00c00010001000000390004d83ad4840000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> __syscall_recv begin EP[9.9.9.9:53]
> __syscall_recv'd_internal [020000350909090900000000000000009.9.9.9:53]
> ecea818000010001000000000377777706676f6f676c6503636f6d00001c0001c00c001c00010000003400102a0014504001080800000000000020040000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> connect:0200ffffd83ad4840000000000000000
> connect:0a00ffff000000002a00145040010808000000000000200400000000
> 
> So we did sent the bytes
> ecea010000010000000000000377777706676f6f676c6503636f6d0000010001 and the
> other sequence (hex encoded, 2 characters per byte) to each of the DNS and
> we received back one 'short' and one 'long' reply from 9.9.9.9.
> I guess the short one is IPv4, long one IPv6(?). That's the case with all
> successful lookups, i.e. the 99% ok ones - (at least) one short - (at
> least) one long.
> 
> Now the problematic one:
> 
> begin lookup1...
> __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> 94d90100000100000000000003777777037765620264650000010001
> __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> 94d90100000100000000000003777777037765620264650000010001
> __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> 94d90100000100000000000003777777037765620264650000010001
> __syscall_send_internal 020000350808040400000000000000008.8.4.4:53]
> 94d901000001000000000000037777770377656202646500001c0001
> __syscall_send_internal 02000035d043dede0000000000000000208.67.222.222:53]
> 94d901000001000000000000037777770377656202646500001c0001
> __syscall_send_internal 020000350909090900000000000000009.9.9.9:53]
> 94d901000001000000000000037777770377656202646500001c0001
> 
> __syscall_recv begin EP[0.0.0.0:0]
> __syscall_recv'd_internal [020000000000000000000000000000009.9.9.9:53]
> 94d981800001000100010000037777770377656202646500001c0001c00c000500010000004e000f0377777708672d68612d776562c014c02c00060001000000340031036e733102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a800000003c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> __syscall_recv begin EP[9.9.9.9:53]
> __syscall_recv'd_internal [020000350909090900000000000000008.8.4.4:53]
> 94d981800001000100010000037777770377656202646500001c0001c00c0005000100000114000f0377777708672d68612d776562c014c02c000600010000002c0031036e733102706f0675692d646e73c0140a686f73746d6173746572c04378860f1200002a3000000e1000093a800000003c000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
> getaddrinfo1: Address in use
> 
> We received two 'long' responses, one from 9.9.9.9; one from 8.8.4.4
> 
> All occurrences of the problem show this constellation - two 'long'
> responses received.
> As a note, of course my implementation of recv returns the correct number
> of bytes received. The zeros you see are only from the dump function, it's
> dumping the 512 byte buffer.
> 
> I hope this is helpful in some manner.
> 
> I came across this, I seems glibc had a similar issue (I did not look
> in-depth, just want to share the link)
> https://bugzilla.redhat.com/show_bug.cgi?id=1044628
> https://sourceware.org/legacy-ml/libc-alpha/2014-04/msg00321.html

OK, I found your problem. It's that the query ids for both the A and
AAAA are the same, probably because you have a low-resolution or
non-working clock_gettime. If the host environment does not provide a
way to get a high resolution clock, I think you should still apply a
monotonic increasing increment of the nanoseconds on each call where
the host environment's time did not increase so that the clock is
strictly monotonic. However musl's resolver should also deal with this
case since it's always possible to get identical query ids (with low
probability). We should just check if they're equal, and if so,
increment the second one. I'll write a patch to do this.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.