musl - Re: aarch64 SME support issues

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20250709224732.GM288056@port70.net>
Date: Thu, 10 Jul 2025 00:47:32 +0200
From: Szabolcs Nagy <nsz@...t70.net>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: aarch64 SME support issues

* Rich Felker <dalias@...c.org> [2025-07-09 15:02:35 -0400]:
> On Wed, Jul 09, 2025 at 02:45:54PM -0400, Rich Felker wrote:
> > On Wed, Jul 09, 2025 at 04:26:46PM +0200, Szabolcs Nagy wrote:
> > > > Do you have a recommendation/preference beween masking it off or
> > > > dropping the __getauxval exposure for now?
> > > > 
> > > > I think I'd rather mask it off, since in the (unusual but plausible)
> > > > case where a static-only toolchain is built, I think the libgccc
> > > > configure test will see the hidden __getauxval and be able to use it
> > > > already.
> > > > 
> > > > And if we do masking, I think it makes sense to mask off all unknown
> > > > bits so this doesn't happen again in the future with the next new
> > > > thing, but I'm not sure. Does this sound reasonable? Are there any
> > > > cases where *hiding* a hwcap bit could result in malfunction?
> > > 
> > > ok i hadnt considered the __getauxval change, i think that
> > > is useful to go in: it will take time to safely update libgcc
> > > so better to add it sooner and potentially more widely useful
> > > than just for SME.
> > > 
> > > i think hiding a hwcap bit may lead to inconsistencies due
> > > to kernel behaving differently than what libc pretends,
> > > but i don't have a strong case, it likely can only affect
> > > hacky code. so likely no abi break for normal code.
> > 
> > Yes that's what I'd expect.
> > 
> > > e.g. kernel enables BTI on vdso (or static exe) and user code
> > > trying to indirect jump into the middle of a function after
> > > checking via the libc hwcap that bti is off.
> > > 
> > > or creating MTE tagged objects via mprotect + instructions
> > > based on cpuid and then passing them to a function that is
> > > only MTE safe when HWCAP_MTE is set.
> > 
> > Note that we don't need to mask off any caps we already know the
> > semantics for, only SME and possibly as-yet-unassigned ones we don't
> > know will be safe without libc support.

these were meant to be examples of how masking
a future unknown hwcap bit may go wrong based
on existing hwcaps where libc hwcap vs kernel/isa
difference may be visible.

> > 
> > > or different part of atomics code trying to detect 128bit
> > > lse atomics support differently (hwcap vs cpuid).
> > > 
> > > note that HWCAP2 is all used up, and now the top 32 bits
> > > of HWCAP are getting allocated (used to be reserved when
> > > we thought ilp32 was a thing, now only the top 2 bits are
> > > kept for libc to use), musl does not have AT_HWCAP3 but
> > > user code may query that anyway as AT_* values are abi.
> > > not sure if you plan to deal with AT_HWCAP3 too.
> > > 
> > > i think masking HWCAP_SME* and top bits of AT_HWCAP
> > > above 1<<41 should be fine for now. presumably this
> > > can be undone if sme support is added.
> > 
> > Sounds good. Should we add and mask hwcap3 too?
> 
> Hmm, it looks like there are hwcap2 sme bits:
> 
> #define HWCAP2_SME		(1 << 23)
> #define HWCAP2_SME_I16I64	(1 << 24)
> #define HWCAP2_SME_F64F64	(1 << 25)
> #define HWCAP2_SME_I8I32	(1 << 26)
> #define HWCAP2_SME_F16F32	(1 << 27)
> #define HWCAP2_SME_B16F32	(1 << 28)
> #define HWCAP2_SME_F32F32	(1 << 29)
> #define HWCAP2_SME_FA64		(1 << 30)
> ...
> #define HWCAP2_SME2		(1UL << 37)
> #define HWCAP2_SME2P1		(1UL << 38)
> #define HWCAP2_SME_I16I32	(1UL << 39)
> #define HWCAP2_SME_BI32I32	(1UL << 40)
> #define HWCAP2_SME_B16B16	(1UL << 41)
> #define HWCAP2_SME_F16F16	(1UL << 42)
> ...
> #define HWCAP2_SME_LUTV2	(1UL << 57)
> #define HWCAP2_SME_F8F16	(1UL << 58)
> #define HWCAP2_SME_F8F32	(1UL << 59)
> #define HWCAP2_SME_SF8FMA	(1UL << 60)
> #define HWCAP2_SME_SF8DP4	(1UL << 61)
> #define HWCAP2_SME_SF8DP2	(1UL << 62)
> 
> Not clear if any others are SME-related.
> 
> In plain hwcap I see:
> 
> #define HWCAP_SME2P2		(1UL << 42)
> #define HWCAP_SME_SBITPERM	(1UL << 43)
> #define HWCAP_SME_AES		(1UL << 44)
> #define HWCAP_SME_SFEXPA	(1UL << 45)
> #define HWCAP_SME_STMOP		(1UL << 46)
> #define HWCAP_SME_SMOP4		(1UL << 47)
> 
> And no hwcap3 bits defined yet.
> 
> Should the above all be masked? Any I missed?

yeah i'd mask them all even if in principle
HWCAP2_SME should be enough. i don't think
any of the non-SME hwcaps imply HWCAP2_SME.

if we mask future bits then i think HWCAP3 should
be masked too. there are no bits defined yet, so
no existing kernel would pass it in auxv yet, but
once it is passed musl should return 0 for it.

i just fear that if ppl figure out that musl is
masking bits they will try to work it around by
using whacky cpu feature detection. so ideally
we don't keep masking forever (i can look into
adding sme support, but not right now).
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.