Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Thu, 21 Apr 2016 21:29:52 +0200
From: Sebastian Gottschall <s.gottschall@...wrt.com>
To: musl@...ts.openwall.com
Subject: Re: recvmsg/sendmsg broken on mips64

Am 21.04.2016 um 17:36 schrieb Rich Felker:
> On Thu, Apr 21, 2016 at 09:22:16AM +0200, Sebastian Gottschall wrote:
>> Am 21.04.2016 um 03:37 schrieb Rich Felker:
>>> On Sun, Apr 10, 2016 at 10:35:22PM -0400, Rich Felker wrote:
>>>> On Mon, Apr 11, 2016 at 12:33:07AM +0200, Sebastian Gottschall wrote:
>>>>> Am 11.04.2016 um 00:29 schrieb Rich Felker:
>>>>>> On Mon, Apr 11, 2016 at 12:24:49AM +0200, Sebastian Gottschall wrote:
>>>>>>>> I think what nsz was asking for, and what I'd like to see, is a way to
>>>>>>>> reproduce the bug. I'm going to try building iproute2 for mips64 and
>>>>>>>> running it on a prebuilt kernel from Aboriginal Linux under
>>>>>>>> qemu-system-mips64, but I don't know what specific commands are needed
>>>>>>>> to hit the affected code path.
>>>>>>> any command since all is netlink based
>>>>>>> ip add add 192.168.1.1/24  dev eth0
>>>>>>>
>>>>>>> yo will see that nothing will happen. ip will just return a error
>>>>>>> message (i wrote this message already in the first entry on this
>>>>>>> mailinglist)
>>>>>>> "EOF on netlink" is the error which is shown
>>>>>> OK, I'll try this.
>>>>>>
>>>>>>>>> its all resulting in the same failing recvmsg / sendmsg call.. so
>>>>>>>>> yes libnetlink.c does not work with musl on mips64 (it does work on
>>>>>>>>> x64 and everything else, just not mips64) unless the hack i offered
>>>>>>>>> was applied which again fixed all.
>>>>>>>>> before you ask again for a problem description, just read again. it
>>>>>>>>> wont change the description if you ask again and just makes people
>>>>>>>>> tired on this list.
>>>>>>>> Both versions of the struct (musl's and your modified one that matches
>>>>>>>> the kernel) have the exact same layout, but due to having a member
>>>>>>>> with 64-bit type, yours has 8-byte alignment and musl's only has
>>>>>>>> 4-byte alignment. This means, at least:
>>>>>>>>
>>>>>>>> 1. When musl's sendmsg.c makes its copy to zero out the padding, the
>>>>>>>>     copy may not be correctly aligned for 64-bit writes, and the kernel
>>>>>>>>     faults or manually produces an error for this case, causing the
>>>>>>>>     whole operation to fail. However, I don't see where iproute2 is
>>>>>>>>     actually passing control messages to sendmsg, so while this is a
>>>>>>>>     problem, I don't think it's the cause. Maybe I'm missing the
>>>>>>>>     affected call point; this is why I'd like steps to reproduce the
>>>>>>>>     issue so I can see it.
>>>>>>>>
>>>>>>>> 2. iproute2's libnetlink.c's rtnl_listen function does not properly
>>>>>>>>     declare its cmsgbuf with the alignment of cmsghdr; it has type
>>>>>>>>     char[] so the compiler is free not to align it at all. This is
>>>>>>>>     presumably a bug in iproute2, but I can't find any good
>>>>>>>>     documentation (in the standards or Linux-specific) for how you're
>>>>>>>>     supposed to allocate this space, so maybe the kernel is able to
>>>>>>>>     handle aligning the buffer itself. I don't see any way the
>>>>>>>>     alignment of musl's cmsghdr type affects recvmsg though.
>>>>>>>>
>>>>>>>> Maybe there are other effects I'm missing? I'll follow up again once I
>>>>>>>> get a test build/run of iproute2 and let you know whether I can see
>>>>>>>> the problem.
>>>>>>> okay. if you need a remote access to a octeon system using musl (my
>>>>>>> fixed variant), just tell me.
>>>>>> That would be really helpful. Something's wrong with the userspace for
>>>>>> the Aboriginal mips64 binaries (SIGBUS in init) and debugging that
>>>>>> would be a big distraction.
>>>>>>
>>>>>> BTW do you have gdb and strace available?
>>>>> not on the system itself. i'm not sure if strace works on mips64.
>>>>> never tried it.
>>>>> but you're free to copy any binary to the /tmp dir. it has 2 gb ram.
>>>>> so enough space for static binaries if you want to play with.
>>>>> i will send you the ssh data in a private email
>>>> I haven't been able to reproduce the error on your system. I've tried
>>>> building my own static-linked version of the "ip" utility with a
>>>> mips64-linux-musl softfloat compiler, and uploading my libc.so and
>>>> using it to run both your version of ip and a dynamic-linked one I
>>>> just built. They all work fine for adding/removing a 127.0.0.2 address
>>>> to the "lo" interface.
>>>>
>>>> Next I'm going to try to get a minimal testcase that tries to
>>>> intentionally misalign the control message buffers. I suspect I'm just
>>>> "getting lucky" and my buffer happens to be aligned the way the kernel
>>>> wants by chance.
>>> I've managed to track down the cause of the breakage. Somehow your
>>> iproute2 has been miscompiled. What I did was add debug logic to
>>> libc.so to print the contents of the msghdr struct passed in before
>>> fixups, after fixups, and after the syscall. The output I got was:
>>>
>>> msghdr: 0xffffd58e08 12 0xffffd58df8 1 0 0 0 0 0
>>> msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 0
>>> msghdr: 0xffffd58e08 12 0xffffd58df8 0 0 0 0 0 32
>>>
>>> The fields (including __pad1 and __pad2) are printed in order. So as
>>> you can see, ip passed in a structure with a 1 in __pad1 and a 0 in
>>> msg_iovlen. The source (libnetlink.c) stores 1 to msg_iovlen, so my
>>> guess is that somehow it ended up getting the wrong-endian version of
>>> the structure definition. You could confirm this by adding #error to
>>> the little-endian case in arch/mips64/bits/socket.h and recompiling. I
>>> suspect it's going to take some additional work to track down the
>>> cause, which is likely specific to something in your toolchain (it
>>> didn't happen for me when I built my own iproute2).
>> i tried that already before i contacted you. the #error case never
>> raises within the little endian case
> Was that when compiling musl or iproute2? The problem is in how
> iproute2 was built; your libc.so seems fine.
iproute2 for sure
>
>> so your guess doesnt match reality. (i even tried it again right
>> now. all is fine. it only uses the big endian case)
> If it's not the endian tests, I don't know what else would have caused
> this. I'll get a disassembly dump of the function to show you. Is
> there any way I can reproduce your exact toolchain to see if I can get
> the same miscompilation to happen?
i can provide you a tarball of the used toolchain compiled for amd64 
(its plain openwrt gcc 5.3.0 using musl)
the iproute2 package which is used is 
http://svn.dd-wrt.com/browser/src/router/iproute2
thats the one which is used for all targets. its not the newest but the 
one i'm using on all targets (working on x64, x32, little endian, big 
endian, arm, mips, powerpc etc)

if something helps. just tell me

> Rich
>

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.