musl - Re: Re: Using direct socket syscalls on x86

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVzugn9vUjQfHcPrHcRUJk+wgBtvhUz7U0=H2tWxZzAWg@mail.gmail.com>
Date: Mon, 27 Jul 2015 18:38:08 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Rich Felker <dalias@...c.org>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>, 
	Alexander Larsson <alexander.larsson@...il.com>
Subject: Re: Re: Using direct socket syscalls on x86_32 where available?

On Mon, Jul 27, 2015 at 6:21 PM, Rich Felker <dalias@...c.org> wrote:
> On Mon, Jul 27, 2015 at 06:04:11PM -0700, Andy Lutomirski wrote:
>> On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker <dalias@...c.org> wrote:
>> > On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote:
>> >> On 07/26/2015 09:59 AM, Rich Felker wrote:
>> >> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote:
>> >> >>On x86_32, the only way to call socket(2), etc is using socketcall.
>> >> >>This is slated to change in Linux 4.3:
>> >> >>
>> >> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb
>> >> >>
>> >> >>If userspace adapts by preferring the direct syscalls when available,
>> >> >>it'll make it easier for seccomp to filter new userspace programs
>> >> >>(and, ideally, eventually disallow socketcall for sandbox-aware code).
>> >> >>
>> >> >>Would musl be willing to detect these syscalls and use them if available?
>> >> >>
>> >> >>(Code to do this probably shouldn't be committed until that change
>> >> >>lands in Linus' tree, just in case the syscall numbers change in the
>> >> >>mean time.)
>> >> >
>> >> >My preference would be not to do this, since it seems to be enlarging
>> >> >the code and pessimizing normal usage for the sake of a very special
>> >> >usage scenario. At the very least there would be at least one extra
>> >> >syscall to probe at first usage, and that probe could generate a
>> >> >termination on existing seccomp setups. :-p
>> >>
>> >> There will be some tiny performance benefit for newer kernels: it
>> >> avoids a silly indirection that has a switch statement along six
>> >> stores into memory, validation of the userspace address, and then
>> >> six loads to pull the syscall args back out of memory.  It's not a
>> >> big deal, but the new syscalls really will be slightly faster.
>> >
>> > Unless you're going to try the new syscalls first and fallback on
>> > ENOSYS every time...
>> >
>> >> >So far we don't probe and
>> >> >store results for any fallbacks though; we just do the fallback on
>> >> >error every time. This is because all of the existing fallbacks are in
>> >> >places where we actually want new functionality a new syscall offers,
>> >> >and the old ones are not able to provide it precisely but require poor
>> >> >emulation, and in these cases it's expected that the user not be using
>> >> >old kernels that can't give correct semantics. But in the case of
>> >> >these socket calls there's no semantic difference or reason for us to
>> >> >be preferring the 'new' calls. It's just a duplicate API for the same
>> >> >thing.
>> >>
>> >> One way to implement it would be to favor the new syscalls but to
>> >> set some variable the first time one of them returns ENOSYS.  Once
>> >> that happens, either all of them could fall back to socketcall or
>> >> just that one syscall could.
>> >
>> > ...right, a global. Which requires a barrier to access it. A barrier
>> > costs a lot more than a few loads or a switch.
>>
>> Not on x86, and this is as x86-specific as it gets.  In fact, I bet
>
> Is x86 really the only arch that needs socketcall multiplexing? If so
> that makes transitioning more attractive. I thought at least a few
> others needed it too.
>

I'll try to figure out whether there are others and submit patches.

>> the totally untested code below is actually safe on pretty much any
>> architecture that has free C11-style relaxed loads (and this code
>> could even be switched to use actual C11 relaxed loads):
>>
>> volatile int socket_is_okay = true;
>>
>> if (socket_is_okay) {
>>     ret = socket(...);
>>     if (ret < 0) {
>>       if (ret == -ENOSYS) {
>>         socket_is_okay = false;
>>       } else {
>>         errno = -ret;
>>         return -1;
>>     }
>>
>>     return ret;
>> } else {
>>   usual socketcall code here;
>> }
>
> This is probably workable with volatile there. Without volatile the
> x86 memory model does not help you; the compiler can make
> transformations that would make it unsafe even if the machine code you
> expected the compiler to generate would be safe. But I still don't
> like hacks like this. It's a big mess to keep it from getting used on
> non-x86 where it would be invalid/unsafe.

Why's it unsafe on non-x86?  I think it's safe if all those volatile
accesses are replaced with standard C11 relaxed accesses.  The only
thing that code requires for correctness is that a relaxed read never
returns a result that never was nor will be written.

>
>> >> Or you could just avoid implementing it and see if anyone complains.
>> >> It's plausible that xdg-app might start requiring the new syscalls
>> >> (although it would presumably not kill you if tried to use
>> >> socketcall).
>> >>
>> >> Alex, if glibc started using the new syscalls, would you want to
>> >> require them inside xdg-app?
>> >
>> > I don't see any reason to require them except forcing policy. And I
>> > don't see any reason for adding them to the kernel to begin with.
>> > While we would have been better off with proper syscalls for each one
>> > rather than this multiplexed mess if it had been done right from the
>> > beginning, having to support both is even worse than the existing
>> > multiplexed socketcall.
>>
>> Worse for libc implementations, certainly.  On the other hand, the
>> ability to cleanly limit address families and such is genuinely
>> useful, and deployed software does it on x86_64.  It's not really
>> possible with current kernels on x86_32, but, with these patches, it
>> becomes possible on x86_32 as long as libc implementations play along
>> and sandbox implementations are willing to force their payloads to use
>> new enough libc implementations.
>>
>> If I were porting something like Sandstorm to x86_32 and glibc
>> supported the new syscalls, this would be a no-brainer for me.  I'd
>> simply block socketcall entirely (returning -ENOSYS) in the container,
>> and anyone providing an app that wants to use sockets has to link
>> against new glibc.
>
> Doing that would create a hard dependency on latest glibc and latest
> kernel, which would be a show-stopper for use on Debian, etc. :-)

It only requires the payload to depend on the latest glibc, though,
and the payload might be a binary from elsewhere.

--Andy
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.