![]() |
|
Message-ID: <9dd123f1-0053-4f92-83bd-b443a78cfeb5@gmail.com> Date: Sun, 24 Aug 2025 08:11:30 -0400 From: Demi Marie Obenour <demiobenour@...il.com> To: Rich Felker <dalias@...c.org> Cc: musl@...ts.openwall.com, libc-alpha@...rceware.org Subject: Re: Running code on all other threads (for sandboxing) On 8/23/25 22:18, Rich Felker wrote: > On Fri, Aug 22, 2025 at 09:34:55PM -0400, Demi Marie Obenour wrote: >> There are cases where it is highly desirable for a process >> to start out with full user rights (or at least close to them), >> initialize, and then drop these privileges using Linux kernel >> features like seccomp. Unfortunately, this breaks if the >> process uses third-party libraries that create threads during >> initialization. In particular, Mesa can do this, and there is >> no realistic alternative to it as Mesa is ~2 million lines of >> GPU compiler and driver code. Loading Mesa later is undesirable >> as it prevents removing all filesystem access. >> >> There are two ways to fix this problem: >> >> 1. Fix the problem in the Linux kernel. >> 2. Work around it in userspace, as is already done for setuid() >> and friends. >> >> For the second, it should be sufficient to provide a function >> that runs a caller-provided function on each thread, while >> ensuring that the process is atomic with respect to other >> threads in the process. This function only needs to make >> system calls and crashes the process if there is an error. >> If the function uses anything that isn't a syscall or >> compiler builtin, it gets to keep both pieces. >> >> Is this something that would make sense to implement? I know >> that this problem has been an issue for Chromium on Linux. > > I'm not sure what the right solution to this specific problem is, but > I don't think exposing a "run arbitrary code in each thread" as a > public API is a good choice. Such code would run in a context which is > worse/more-restrictive even than "async signal" context, making it > really difficult to define any reasonable class of "what you're > allowed to do here". I know you said "syscalls", but even that > requires defining what you mean by syscalls (raw via asm? via > syscall()? any function that's "traditionally just a syscall"?) and > further specifying which syscalls are actually allowed (any which > break the __synccall context assumptions would need to be forbidden). I think just seccomp() and compiler-inserted calls to functions like memcpy(). memcpy() should only depend on a valid stack (which *is* guaranteed unless I am greatly mistaken) and seccomp() is just a wrapper around syscall(). > I think there are potentially semi-portable solutions to your problem > that don't require such a big hammer as arbitrary __synccall. > > One that comes to mind is installing a SECCOMP_RET_USER_NOTIF or > SECCOMP_RET_TRAP filter before loading Mesa. This could allow the > filesystem access to load Mesa libraries only until you set a flag > that loading has finished, then cause filesystem access syscalls to > fail once the flag has been set. Would this involve emulating all the filesystem syscalls? The problem is that the flag would need to be set in a way that it can’t be unset. > Another approach is doing what I'd call "manual __synccall" with your > own signal, which is better than exposing actual __synccall because > the application code does not run in an invalid-libc context, but this > would only work if Mesa's hidden threads don't mask signals. A library > creating its own threads behind the scenes *should* be masking all > signals, so this probably doesn't work. Even if Mesa botched it, you > wouldn't want to preclude them fixing it. Also, from my reading of past mailing list posts, this is inherently racy against thread creation. > There is probably also a way to do this with ptrace, which blocked > signals wouldn't interfere with, but that gets really nasty really > quick. > > Unfortunately there don't seem to be any ways to inject new seccomp > filters into another task (even a thread of your own process) > directly. This is what Linux really should be offering here. Actually, it already supports this (SECCOMP_FILTER_FLAG_TSYNC). I don't think this is supported for Landlock, though. -- Sincerely, Demi Marie Obenour (she/her/hers) Download attachment "OpenPGP_0xB288B55FFF9C22C1.asc" of type "application/pgp-keys" (7141 bytes) Download attachment "OpenPGP_signature.asc" of type "application/pgp-signature" (834 bytes)
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.