Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 16 Feb 2022 13:44:35 -0500
From: Satadru Pramanik <satadru@...il.com>
To: Rich Felker <dalias@...ifal.cx>
Cc: musl@...ts.openwall.com
Subject: Re: Re: musl getaddr info breakage on older kernels

The only change to socket.c I'm seeing is use __socketcall to simplify
socket()
<https://git.musl-libc.org/cgit/musl/commit/?id=7063c459e7dbd63c2c94e04413743abab5272001>,
so maybe it would make sense for me to try building with that reversed?

satadru

On Wed, Feb 16, 2022 at 1:37 PM Satadru Pramanik <satadru@...il.com> wrote:

>
>>
>> - Whether any network traffic occurs when it fails (in the real
>>   environment not a replicated one elsewhere).
>>
>>
> There is no network traffic in the real environment.
>
>
>> - Whether it fails or succeeds under strace (in the real
>>   environment not a replicated one elsewhere).
>>
>> It succeeds in strace (in the real environment)
>
>
>
>> - Whether the real environment involves Docker or not.
>>
>> The real environment does not involve docker.
>
>
>
>> - What's in resolv.conf (in the real environment not a replicated one
>>   elsewhere) and what nameserver software (if known) is running on the
>>   nameserver(s) listed in there.
>>
>> The nameserver is picked up from dhcp. The contents of the file are as
> follows:
> nameserver 192.168.0.1
> search lan.
> options single-request timeout:1 attempts:5
>
>
>> - Anything else that might be relevant.
>>
>> DNS server is dnsmasq running on a current OpenWRT device.
>
>
>> It's really hard to offer any productive advice when the problem is
>> unclear.
>>
>> Apologies for the confusion.
> I'm really just trying to debug this getaddrinfo breakage on this older
> hardware. The docker containers setup is something we use to build packages
> for this hardware, and our frustration is that the software works perfectly
> fine in the docker containers, but not on the hardware.
>
> > Any other suggestions on how to track down this issue?
>>
>> Rather than stepping through, I would put a single breakpoint at a
>> place you want to see whether execution reaches before running the
>> test program, then start it and see if the breakpoint fires or not.
>> Then remove the breakpoint, add a different one, and repeat. For
>> example, see if __res_msend is ever called, and if so, whether
>> particular lines of it are reached (or just put breakpoints on some of
>> the functions it calls, like socket, bind, recvfrom, poll, etc. to see
>> if they're called).
>>
>> It might also be useful to put a breakpoint on clock_gettime and then
>> 'finish' to see what it returns (in case the problem is something
>> time64-related).
>>
>>
> The only breakpoint which fixed the execution was for line 20 (which
> invokes getaddrinfo). Stepping through the __kernel_vsyscall and then
> continuing is the only way it does not result in failure.
>
> Any later breakpoints fail.
>
> I went though the other breakpoints as requested.
> clock_gettime did not fire.
>
> Breakpoint 1 at 0x5c2f7: file ../src_musl/compat/time32/clock_gettime32.c,
> line 9.
> __res_msend, setsockopt also did not fire.
> The ones that did fire were: socket, bind, recvfrom, poll, __res_msend_rc,
> memset, sendto, __get_resolv_conf, pthread_setcancelstate,
> __pthread_setcancelstate, __lookup_serv, __lookup_name, memcpy
>
> When breaking on socket, stepping through the __kernel_vsyscall call after
> socket and then continuing succeeds.
>
> Is it possible that the socket is not waiting long enough for a response
> from __kernel_vsyscall? Has that changed?
> Breaking, stepping, and continuing on every other function above fails.
>
> The gdb log is attached.
>
> Regards,
>
> Satadru
>
>

Content of type "text/html" skipped

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.