musl - Re: Project Proposal MTE Support

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20230720190244.GQ4163@brightrain.aerifal.cx>
Date: Thu, 20 Jul 2023 15:02:45 -0400
From: Rich Felker <dalias@...ifal.cx>
To: Stefan Jumarea <stefanjumarea02@...il.com>, musl@...ts.openwall.com,
	Razvan Deaconescu <razvand@...kraft.io>,
	Michalis Pappas <michalis@...kraft.io>
Subject: Re: Project Proposal MTE Support

On Thu, Jul 20, 2023 at 03:03:10PM +0200, Szabolcs Nagy wrote:
> * Rich Felker <dalias@...ifal.cx> [2023-07-19 21:51:22 -0400]:
> > On Wed, Jul 19, 2023 at 10:37:06AM +0300, Stefan Jumarea wrote:
> > > Hello all,
> > > 
> > > With the present, I would like to discuss the prospect of adding MTE support in
> > > the Musl memory allocator.
> > > 
> > > Currently, (starting with release 1.2.1, August 2020, commit 503bd3976623)[1], Musl
> > > introduces a new "malloc" implementation ("mallocng"), which solves a lot of the
> > > intended malloc-hardening issues. However, further hardening can be implemented,
> > > including MTE (Memory Tagging Extension) support.
> > 
> > mallocng was designed with the idea of possible future MTE use in
> > mind. At the time, I seem to recall there were obstacles to being able
> > to use MTE, so it wasn't pursued. But it's definitely an interesting
> > and powerful direction, one of the few ISA-level hardening features
> > that's actually a hard access control boundary rather than loads of
> > complexity to make ROP slightly harder.
> 
> note: it is not quite "hard access control" because *(p+off)
> can still access anything when off is attacker controlled:
> the top bits of p are not protected so the tag can change and
> there are only 16 tags, so it can be guessed. (in principle
> the compiler could fix this by always masking "off" in ptr
> arithmetic, but that's not implemented and has costs.)

It's a hard access control on some things, like linear (sequential)
overflow. Obviously with only 16 (IIRC) tags there will be plenty of
things you can access with arbitrary offsetting even if it masks tag
bits. Still, this rules out large classes of exploitable bugs.

> i tried mallocng with mte at some point (and worked on the
> glibc mte malloc too).
> 
> one issue was the 4b in-band data (at end of prev slot), it
> should not be in the same mte granule (16b) as user data
> otherwise it can be corrupted and more importantly there is

mallocng doesn't care if it can be corrupted; it's validated before
being used, rather than trusted. With MTE, though, we need a way to
access it bypassing the tag check. I don't think looking up the tag
and then using it would even suffice, since ...

> an access issue as the tag of that granule may change async.

...it could change asynchronously. Hopefully this isn't a blocker. Is
there a way to do a non-tag-checking load?

> another issue was that MADV_FREE clears the tags, so heap
> memory owned by the malloc implementation (meta data) must
> use 0 tag instead of a different dedicated tag.

Use of MADV_FREE has been dropped now so that's not a problem.

> note: tagging must be posible to turn off for a process
> because some sw assumes page size granule protection (does
> oob read access) or uses pointer top bits in some way.

I don't think we have any contract to support that usage.

> > > We are using Musl as the primary libc within Unikraft, a Unikernel Developing Kit[2],
> > > and we support MTE on the low-level memory allocator. This however lacks a lot in
> > > terms of granularity, as the internal allocator has a page-size minimum allocation
> > > level, and tagging one page at the time still allows for memory safety violations
> > > in the area of one page.
> > > 
> > > Our goal is to have MTE protection implemented in a fine-grained allocator
> > > (i.e. Musl "malloc" implementation), that will successfully prevent memory safety
> > > violations.
> > > 
> > > Extended measurements will need to be done in order to provide a clear overview
> > > over the performance impact that using MTE will have, but the architecture
> > > provides ways to optimize the implementation for functions like "calloc" or
> > > large "malloc" blocks (instructions like "store allocation tag with zeroing",
> > > "sdgz", "store allocation tag for blocks", "sdgm"), along with an asynchronous
> > > way to check for a Tag Check Fault (e.g. on IRQs, on task / thread switches, etc.).
> 
> async tag check is not very useful for debugging nor for
> security (essentially the failure is delayed to the next
> syscall and the damage can be done by that time). there
> is asymmetric mode (sync read check, async write check)
> which may be useful but requires newer arch.

If async check also affected other cores, I think it would still be a
security boundary except for admitting compromised-state changes to
persistent shared memory/mapped files. But I doubt it does, and is
there any reason to prefer this kind of deferred checking anyway?

> > > A bit about myself, my name is Ștefan Jumărea, I am an undergraduate student in
> > > the final year at University Politehnica of Bucharest[3], and I’ve been part of the
> > > Unikraft OSS project[2] for almost two years.
> > > I would like to make this my diploma project (due June 2024).
> > > 
> > > Is it something of interest in the Musl community?
> > > Is it planned work, is there anyone else working on it? If not, I would like to
> > > start working on the project in the next weeks.
> > > Do you have any comments, suggestions, or other things I should consider?
> > 
> > There is no immediate plan. Probably the first steps need to be
> > figuring out some abstractions needed, particularly a way for the
> > implementation to take tagged pointers from the caller and do the
> > arithmetic to access partly out-of-band data (like the group header)
> > with a different/zero tag. These should be able to collapse out to
> > no-ops on archs without MTE, as well as be defined in a manner to work
> > on other archs with comparable features (like the classic sparc prior
> > art for this, if we ever get the sparc port added, or any future archs
> > that add such a thing).
> 
> note that sparc adi is less practical as its granule is 64b
> (cache line size) which blows up small allocations.

Yes, it would require increasing UNIT to 64 to use, which is largely
undesirable, but would be interesting to play with anyway.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.