Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 27 Jul 2015 21:21:54 -0400
From: Rich Felker <dalias@...c.org>
To: Andy Lutomirski <luto@...capital.net>
Cc: "musl@...ts.openwall.com" <musl@...ts.openwall.com>,
	Alexander Larsson <alexander.larsson@...il.com>
Subject: Re: Re: Using direct socket syscalls on x86_32 where
 available?

On Mon, Jul 27, 2015 at 06:04:11PM -0700, Andy Lutomirski wrote:
> On Mon, Jul 27, 2015 at 5:45 PM, Rich Felker <dalias@...c.org> wrote:
> > On Mon, Jul 27, 2015 at 04:56:51PM -0700, Andy Lutomirski wrote:
> >> On 07/26/2015 09:59 AM, Rich Felker wrote:
> >> >On Sat, Jul 25, 2015 at 10:54:28AM -0700, Andy Lutomirski wrote:
> >> >>On x86_32, the only way to call socket(2), etc is using socketcall.
> >> >>This is slated to change in Linux 4.3:
> >> >>
> >> >>https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?h=x86/asm&id=9dea5dc921b5f4045a18c63eb92e84dc274d17eb
> >> >>
> >> >>If userspace adapts by preferring the direct syscalls when available,
> >> >>it'll make it easier for seccomp to filter new userspace programs
> >> >>(and, ideally, eventually disallow socketcall for sandbox-aware code).
> >> >>
> >> >>Would musl be willing to detect these syscalls and use them if available?
> >> >>
> >> >>(Code to do this probably shouldn't be committed until that change
> >> >>lands in Linus' tree, just in case the syscall numbers change in the
> >> >>mean time.)
> >> >
> >> >My preference would be not to do this, since it seems to be enlarging
> >> >the code and pessimizing normal usage for the sake of a very special
> >> >usage scenario. At the very least there would be at least one extra
> >> >syscall to probe at first usage, and that probe could generate a
> >> >termination on existing seccomp setups. :-p
> >>
> >> There will be some tiny performance benefit for newer kernels: it
> >> avoids a silly indirection that has a switch statement along six
> >> stores into memory, validation of the userspace address, and then
> >> six loads to pull the syscall args back out of memory.  It's not a
> >> big deal, but the new syscalls really will be slightly faster.
> >
> > Unless you're going to try the new syscalls first and fallback on
> > ENOSYS every time...
> >
> >> >So far we don't probe and
> >> >store results for any fallbacks though; we just do the fallback on
> >> >error every time. This is because all of the existing fallbacks are in
> >> >places where we actually want new functionality a new syscall offers,
> >> >and the old ones are not able to provide it precisely but require poor
> >> >emulation, and in these cases it's expected that the user not be using
> >> >old kernels that can't give correct semantics. But in the case of
> >> >these socket calls there's no semantic difference or reason for us to
> >> >be preferring the 'new' calls. It's just a duplicate API for the same
> >> >thing.
> >>
> >> One way to implement it would be to favor the new syscalls but to
> >> set some variable the first time one of them returns ENOSYS.  Once
> >> that happens, either all of them could fall back to socketcall or
> >> just that one syscall could.
> >
> > ...right, a global. Which requires a barrier to access it. A barrier
> > costs a lot more than a few loads or a switch.
> 
> Not on x86, and this is as x86-specific as it gets.  In fact, I bet

Is x86 really the only arch that needs socketcall multiplexing? If so
that makes transitioning more attractive. I thought at least a few
others needed it too.

> the totally untested code below is actually safe on pretty much any
> architecture that has free C11-style relaxed loads (and this code
> could even be switched to use actual C11 relaxed loads):
> 
> volatile int socket_is_okay = true;
> 
> if (socket_is_okay) {
>     ret = socket(...);
>     if (ret < 0) {
>       if (ret == -ENOSYS) {
>         socket_is_okay = false;
>       } else {
>         errno = -ret;
>         return -1;
>     }
> 
>     return ret;
> } else {
>   usual socketcall code here;
> }

This is probably workable with volatile there. Without volatile the
x86 memory model does not help you; the compiler can make
transformations that would make it unsafe even if the machine code you
expected the compiler to generate would be safe. But I still don't
like hacks like this. It's a big mess to keep it from getting used on
non-x86 where it would be invalid/unsafe.

> >> Or you could just avoid implementing it and see if anyone complains.
> >> It's plausible that xdg-app might start requiring the new syscalls
> >> (although it would presumably not kill you if tried to use
> >> socketcall).
> >>
> >> Alex, if glibc started using the new syscalls, would you want to
> >> require them inside xdg-app?
> >
> > I don't see any reason to require them except forcing policy. And I
> > don't see any reason for adding them to the kernel to begin with.
> > While we would have been better off with proper syscalls for each one
> > rather than this multiplexed mess if it had been done right from the
> > beginning, having to support both is even worse than the existing
> > multiplexed socketcall.
> 
> Worse for libc implementations, certainly.  On the other hand, the
> ability to cleanly limit address families and such is genuinely
> useful, and deployed software does it on x86_64.  It's not really
> possible with current kernels on x86_32, but, with these patches, it
> becomes possible on x86_32 as long as libc implementations play along
> and sandbox implementations are willing to force their payloads to use
> new enough libc implementations.
> 
> If I were porting something like Sandstorm to x86_32 and glibc
> supported the new syscalls, this would be a no-brainer for me.  I'd
> simply block socketcall entirely (returning -ENOSYS) in the container,
> and anyone providing an app that wants to use sockets has to link
> against new glibc.

Doing that would create a hard dependency on latest glibc and latest
kernel, which would be a show-stopper for use on Debian, etc. :-)

> Keep in mind that socket(2) with unrestricted address family is a big
> attack surface and is historically full of nasty vulnerabilities.

Yes, but this is largely the fault of distros for enabling all sorts
of ridiculous address families that nobody needs. If you just enable
inet4/6 and unix, it's not such a problem.

Anyway if x86 really is the only arch where this is needed, or if any
other stragglers are also going to be updated alongside x86, I'm open
to considering supporting the new syscalls. We just need to figure out
a reasonable way to do it.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.