Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 22 Jul 2020 16:14:04 +0200
From: Stefano Garzarella <sgarzare@...hat.com>
To: Daurnimator <quae@...rnimator.com>
Cc: Jens Axboe <axboe@...nel.dk>, Alexander Viro <viro@...iv.linux.org.uk>,
	Kernel Hardening <kernel-hardening@...ts.openwall.com>,
	Kees Cook <keescook@...omium.org>, Aleksa Sarai <asarai@...e.de>,
	Stefan Hajnoczi <stefanha@...hat.com>,
	Christian Brauner <christian.brauner@...ntu.com>,
	Sargun Dhillon <sargun@...gun.me>, Jann Horn <jannh@...gle.com>,
	io-uring <io-uring@...r.kernel.org>, linux-fsdevel@...r.kernel.org,
	Jeff Moyer <jmoyer@...hat.com>, linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC v2 2/3] io_uring: add IOURING_REGISTER_RESTRICTIONS
 opcode

On Wed, Jul 22, 2020 at 12:35:15PM +1000, Daurnimator wrote:
> On Wed, 22 Jul 2020 at 03:11, Jens Axboe <axboe@...nel.dk> wrote:
> >
> > On 7/21/20 4:40 AM, Stefano Garzarella wrote:
> > > On Thu, Jul 16, 2020 at 03:26:51PM -0600, Jens Axboe wrote:
> > >> On 7/16/20 6:48 AM, Stefano Garzarella wrote:
> > >>> diff --git a/include/uapi/linux/io_uring.h b/include/uapi/linux/io_uring.h
> > >>> index efc50bd0af34..0774d5382c65 100644
> > >>> --- a/include/uapi/linux/io_uring.h
> > >>> +++ b/include/uapi/linux/io_uring.h
> > >>> @@ -265,6 +265,7 @@ enum {
> > >>>     IORING_REGISTER_PROBE,
> > >>>     IORING_REGISTER_PERSONALITY,
> > >>>     IORING_UNREGISTER_PERSONALITY,
> > >>> +   IORING_REGISTER_RESTRICTIONS,
> > >>>
> > >>>     /* this goes last */
> > >>>     IORING_REGISTER_LAST
> > >>> @@ -293,4 +294,30 @@ struct io_uring_probe {
> > >>>     struct io_uring_probe_op ops[0];
> > >>>  };
> > >>>
> > >>> +struct io_uring_restriction {
> > >>> +   __u16 opcode;
> > >>> +   union {
> > >>> +           __u8 register_op; /* IORING_RESTRICTION_REGISTER_OP */
> > >>> +           __u8 sqe_op;      /* IORING_RESTRICTION_SQE_OP */
> > >>> +   };
> > >>> +   __u8 resv;
> > >>> +   __u32 resv2[3];
> > >>> +};
> > >>> +
> > >>> +/*
> > >>> + * io_uring_restriction->opcode values
> > >>> + */
> > >>> +enum {
> > >>> +   /* Allow an io_uring_register(2) opcode */
> > >>> +   IORING_RESTRICTION_REGISTER_OP,
> > >>> +
> > >>> +   /* Allow an sqe opcode */
> > >>> +   IORING_RESTRICTION_SQE_OP,
> > >>> +
> > >>> +   /* Only allow fixed files */
> > >>> +   IORING_RESTRICTION_FIXED_FILES_ONLY,
> > >>> +
> > >>> +   IORING_RESTRICTION_LAST
> > >>> +};
> > >>> +
> > >>
> > >> Not sure I totally love this API. Maybe it'd be cleaner to have separate
> > >> ops for this, instead of muxing it like this. One for registering op
> > >> code restrictions, and one for disallowing other parts (like fixed
> > >> files, etc).
> > >>
> > >> I think that would look a lot cleaner than the above.
> > >>
> > >
> > > Talking with Stefan, an alternative, maybe more near to your suggestion,
> > > would be to remove the 'struct io_uring_restriction' and add the
> > > following register ops:
> > >
> > >     /* Allow an sqe opcode */
> > >     IORING_REGISTER_RESTRICTION_SQE_OP
> > >
> > >     /* Allow an io_uring_register(2) opcode */
> > >     IORING_REGISTER_RESTRICTION_REG_OP
> > >
> > >     /* Register IORING_RESTRICTION_*  */
> > >     IORING_REGISTER_RESTRICTION_OP
> > >
> > >
> > >     enum {
> > >         /* Only allow fixed files */
> > >         IORING_RESTRICTION_FIXED_FILES_ONLY,
> > >
> > >         IORING_RESTRICTION_LAST
> > >     }
> > >
> > >
> > > We can also enable restriction only when the rings started, to avoid to
> > > register IORING_REGISTER_ENABLE_RINGS opcode. Once rings are started,
> > > the restrictions cannot be changed or disabled.
> >
> > My concerns are largely:
> >
> > 1) An API that's straight forward to use
> > 2) Something that'll work with future changes
> >
> > The "allow these opcodes" is straightforward, and ditto for the register
> > opcodes. The fixed file I guess is the odd one out. So if we need to
> > disallow things in the future, we'll need to add a new restriction
> > sub-op. Should this perhaps be "these flags must be set", and that could
> > easily be augmented with "these flags must not be set"?
> >
> > --
> > Jens Axboe
> >
> 
> This is starting to sound a lot like seccomp filtering.
> Perhaps we should go straight to adding a BPF hook that fires when
> reading off the submission queue?
> 

You're right. I e-mailed about that whit Kees Cook [1] and he agreed that the
restrictions in io_uring should allow us to address some issues that with
seccomp it's a bit difficult. For example:
- different restrictions for different io_uring instances in the same
  process
- limit SQEs to use only registered fds and buffers

Maybe seccomp could take advantage of the restrictions to filter SQEs opcodes.

Thanks,
Stefano

[1] https://lore.kernel.org/io-uring/202007160751.ED56C55@keescook/

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.