musl - Re: Re: Using direct socket syscalls on x86

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrWpKWv2dJkGO_Wz0nFVKAnD+=N_iP8Y8iPFgFPMhyTY-g@mail.gmail.com>
Date: Mon, 27 Jul 2015 18:04:11 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Rich Felker <dalias@...c.org>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>, 
	Alexander Larsson <alexander.larsson@...il.com>
Subject: Re: Re: Using direct socket syscalls on x86_32 where available?

On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker <dalias@...c.org> wrote:
> On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote:
>> On 07/26/2015 09:59 AM, Rich Felker wrote:
>> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote:
>> >>On x86_32, the only way to call socket(2), etc is using socketcall.
>> >>This is slated to change in Linux 4.3:
>> >>
>> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb
>> >>
>> >>If userspace adapts by preferring the direct syscalls when available,
>> >>it'll make it easier for seccomp to filter new userspace programs
>> >>(and, ideally, eventually disallow socketcall for sandbox-aware code).
>> >>
>> >>Would musl be willing to detect these syscalls and use them if available?
>> >>
>> >>(Code to do this probably shouldn't be committed until that change
>> >>lands in Linus' tree, just in case the syscall numbers change in the
>> >>mean time.)
>> >
>> >My preference would be not to do this, since it seems to be enlarging
>> >the code and pessimizing normal usage for the sake of a very special
>> >usage scenario. At the very least there would be at least one extra
>> >syscall to probe at first usage, and that probe could generate a
>> >termination on existing seccomp setups. :-p
>>
>> There will be some tiny performance benefit for newer kernels: it
>> avoids a silly indirection that has a switch statement along six
>> stores into memory, validation of the userspace address, and then
>> six loads to pull the syscall args back out of memory.  It's not a
>> big deal, but the new syscalls really will be slightly faster.
>
> Unless you're going to try the new syscalls first and fallback on
> ENOSYS every time...
>
>> >So far we don't probe and
>> >store results for any fallbacks though; we just do the fallback on
>> >error every time. This is because all of the existing fallbacks are in
>> >places where we actually want new functionality a new syscall offers,
>> >and the old ones are not able to provide it precisely but require poor
>> >emulation, and in these cases it's expected that the user not be using
>> >old kernels that can't give correct semantics. But in the case of
>> >these socket calls there's no semantic difference or reason for us to
>> >be preferring the 'new' calls. It's just a duplicate API for the same
>> >thing.
>>
>> One way to implement it would be to favor the new syscalls but to
>> set some variable the first time one of them returns ENOSYS.  Once
>> that happens, either all of them could fall back to socketcall or
>> just that one syscall could.
>
> ...right, a global. Which requires a barrier to access it. A barrier
> costs a lot more than a few loads or a switch.

Not on x86, and this is as x86-specific as it gets.  In fact, I bet
the totally untested code below is actually safe on pretty much any
architecture that has free C11-style relaxed loads (and this code
could even be switched to use actual C11 relaxed loads):

volatile int socket_is_okay = true;

if (socket_is_okay) {
    ret = socket(...);
    if (ret < 0) {
      if (ret == -ENOSYS) {
        socket_is_okay = false;
      } else {
        errno = -ret;
        return -1;
    }

    return ret;
} else {
  usual socketcall code here;
}

>
>> Or you could just avoid implementing it and see if anyone complains.
>> It's plausible that xdg-app might start requiring the new syscalls
>> (although it would presumably not kill you if tried to use
>> socketcall).
>>
>> Alex, if glibc started using the new syscalls, would you want to
>> require them inside xdg-app?
>
> I don't see any reason to require them except forcing policy. And I
> don't see any reason for adding them to the kernel to begin with.
> While we would have been better off with proper syscalls for each one
> rather than this multiplexed mess if it had been done right from the
> beginning, having to support both is even worse than the existing
> multiplexed socketcall.

Worse for libc implementations, certainly.  On the other hand, the
ability to cleanly limit address families and such is genuinely
useful, and deployed software does it on x86_64.  It's not really
possible with current kernels on x86_32, but, with these patches, it
becomes possible on x86_32 as long as libc implementations play along
and sandbox implementations are willing to force their payloads to use
new enough libc implementations.

If I were porting something like Sandstorm to x86_32 and glibc
supported the new syscalls, this would be a no-brainer for me.  I'd
simply block socketcall entirely (returning -ENOSYS) in the container,
and anyone providing an app that wants to use sockets has to link
against new glibc.

Keep in mind that socket(2) with unrestricted address family is a big
attack surface and is historically full of nasty vulnerabilities.

--Andy
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.