|
|
Message-ID: <CAC_pWX0Ekr_iOWfdNP07RsCrY87Tj-adg+6KJPta8u0oXNNu2w@mail.gmail.com>
Date: Tue, 25 Nov 2025 22:17:12 -0500
From: Arjun Ramesh <arjunr2@...rew.cmu.edu>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: [Patch Request] Name-bound syscalls within musl
On Tue, Nov 25, 2025 at 8:44 PM Rich Felker <dalias@...c.org> wrote:
> On Tue, Nov 25, 2025 at 12:29:08PM -0500, Arjun Ramesh wrote:
> > Hi everyone,
> >
> > I am currently working on a research project using musl that uses
> > name-bound syscalls. Skimming the codebase, nearly all references to
> > "syscall" invocations within musl use the SYS_* defines. These numbers
> > differ across ISAs and are susceptible to type-safety
> > bugs, providing virtually no type-checking on their arguments. Syscalls
> > bound statically by name will make the codebase much less prone to
> mistakes
> > and cleaner, allowing type-checked syscall arguments and also cutting
> down
> > on the amount of ISA-specific code surface. Below is an example of the
> > patch I'm suggesting for a single syscall:
> >
> > ```
> > diff --git a/src/fcntl/open.c b/src/fcntl/open.c
> > index 4c3c8275..ff5f7973 100644
> > --- a/src/fcntl/open.c
> > +++ b/src/fcntl/open.c
> > @@ -15,7 +15,7 @@ int open(const char *filename, int flags, ...)
> >
> > int fd = __sys_open_cp(filename, flags, mode);
> > if (fd>=0 && (flags & O_CLOEXEC))
> > - __syscall(SYS_fcntl, fd, F_SETFD, FD_CLOEXEC);
> > + __syscall_SYS_fcntl(fd, F_SETFD, FD_CLOEXEC);
> >
> > return __syscall_ret(fd);
> > }
> > ```
> >
> > "__syscall_SYS_fcntl" can be defined with a unified static type signature
> > across ISAs. Given the highly structured nature of this patch, it could
> > mostly be accomplished with a simple `sed` command across the entire
> > project, with no impact on functionality. Would the community be open to
> a
> > patch of this nature?
>
> This kind of invasive change will not be accepted upstream. It just
> moves the logic for the type signatures to a different location, and
> has no concrete benefit. Unnecessary churn like this has a high cost,
> as it prevents users from backporting security and other bugfix
> patches to older or forked versions of the codebase they may be using,
> and requires users who read the commit log as a basis for trusting
> changes to wade through the churn to determine that all the changes it
> made were as-described and correct.
>
> Where there have historically been issues with syscall signatures, it
> has almost entirely been a consequence of weird things the kernel did
> that we didn't know about, and encoding our assumption about the
> signatures at a different abstraction layer would not have made the
> assumptions that mismatched what the kernel was doing any more
> correct. If someone does want to check signatures, this could probably
> be done mechanically on the codebase as-is, extracting information
> from the kernel and checking callpoints against it.
>
> For your research project, I think it's entirely possible to do the
> name-binding just by the choice of how you define the __syscallN
> macros in syscall_arch.h and the SYS_* macros, so that they expand to
> "name bindings". If you're targeting a system where these name-bound
> syscalls are not actual syscalls but callable functions, I think you
> could even do some magic in the macros to make the type checking
> happen like this.
>
> Rich
>
Thanks for the comments.
This makes sense, and macro magic within syscall_arch.h can certainly work.
Looking through the codebase, luckily most call-sites use exclusively SYS_*
macros, allowing this sort of magic to work. However, there are still a
couple of spots that might need some patching where a variable is used for
syscall numbers. These will likely have to expand out to a different macro
expansion -- one which has a giant switch case over all possible syscalls
to name-bind them. At the moment, I identify very few places where this
happens, which is a good thing (seems like both are just for generic
syscall-by-number invocations):
* src/misc/syscall.c
* src/thread/__syscall_cp.c
Given this, would you then be open to minimal patches that would route
these "variable" numbered to a different macro? Perhaps something of the
nature of this in those spots:
```
diff --git a/src/misc/syscall.c b/src/misc/syscall.c
index 6f3ef656..72356346 100644
--- a/src/misc/syscall.c
+++ b/src/misc/syscall.c
@@ -17,5 +17,5 @@ long syscall(long n, ...)
e=va_arg(ap, syscall_arg_t);
f=va_arg(ap, syscall_arg_t);
va_end(ap);
- return __syscall_ret(__syscall(n,a,b,c,d,e,f));
+ return __syscall_ret(__syscall_var(n,a,b,c,d,e,f));
}
```
The `__syscall_var` can be defaulted to `__syscall` on all existing
platforms, but will provide the flexibility for allowing a hook for
name-binding these calls.
Arjun
Content of type "text/html" skipped
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.