Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sun, 16 Nov 2014 00:56:56 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Cc: Andy Lutomirski <luto@...capital.net>,
	Russell King - ARM Linux <linux@....linux.org.uk>,
	Szabolcs Nagy <nsz@...t70.net>, Kees Cook <keescook@...omium.org>,
	"linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>
Subject: ARM atomics overhaul for musl

One item on the agenda for this release cycle is overhauling the way
atomics are done on ARM. I'm cc'ing people who have been involved in
this discussion in the past in case anyone's not on the musl list and
has opinions about what should be done.

The current situation looks like the following:

Pre-v6: Hard-coded to use cas from kuser_helper page (0xffff0fc0)

v6: Hard-coded to use ldrex/strex with mcr-based barrier

v7+: Hard-coded to use ldrex/strex with dmb-based barrier

In the cases where ldrex/strex are used directly, they're still not
used optimally; all the non-cas primitives like atomic inc/dec are
built on top of cas and thus have more loop complexity and probably
more barriers than they should.

Aside from that, the only case among the above that's "right" already
is v7+. Hard-coding the mcr-based barrier on v6 is wrong because it's
deprecated (future models may not support the instruction, and
although the kernel could trap and emulate it this would be horribly
slow) and hard-coding kuser helper on pre-v6 is wrong because pre-v6
binaries might run on v6+ hardware and kernel where the kernel has
been built with the kuser_helper page removed for security.

My main goals for this overhaul are:

1. Make baseline (pre-v6) binaries truely universal so they run even
   on kernels with kuser_helper removed.

2. Make v7+ perform competitively. This means optimal code sequences
   for a_cas, a_swap, a_fetch_add, a_store, etc. rather than just
   doing everything with a_cas.

What's still not entirely clear is what to do with v6, and how goal #1
should be achieved. The options are basically:

A. Prefer using ldrex/strex and an appropriate barrier directly, but
   fall back to kuser_helper (assuming it's present) if the hwcap or
   similar does not indicate availability of atomics.

B. Prefer kuser_helper and and only fallback to using atomics and an
   appropriate barrier directly if kuser_helper page is missing.

Of these two approaches, A seems easier, because it's easier to know
that atomics are available (via HWCAP_TLS) than that kuser_helper is
(which requires some sort of probe for the mapping if we want to
support grsec kernels where the mapping is completely missing; if not,
we can just check the kuser version number at a fixed address).
However neither is really very easy because it seems impossible to
detect whether the mcr-based barrier or the dmb-based barrier should
be used -- there's no hwcap flag to indicate support for the latter.
This also complicates what to do in builds for v6.

Before proceeding, I think we need some sort of proposed way to detect
the availability of dmb. If there really is none, we probably need to
go with option B (prefer kuser_helper) for both pre-v6 and v6 (i.e.
only use atomics directly on v7+) and choose what to do when
kuser_helper is missing: either assume v7+ and use dmb, or assume that
the mcr barrier is still working and use it. I think I would lean
towards the latter.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.