Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20181010012620.GL17110@brightrain.aerifal.cx>
Date: Tue, 9 Oct 2018 21:26:20 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: TLSDESC register-preserving mess

I've run across a bit of a problem in how the TLSDESC calling
conventions work. In the case where the needed DTV slot is not yet
filled in for the calling thread, the dynamic TLSDESC function needs
to call into C code that obtains the memory that was previously
reserved for it, initializes it (involving memcpy/memset), and fills
in the DTV entry for it. This requires saving and restoring any
call-clobbered registers that might be used by C code.

Because the operation involves memcpy/memset, it's not just
theoretically possible but likely that vector registers could be used.
As written, the aarch64 and arm asm save and restore float/vector
registers around the call, but I don't think they're future-proof
against ISA extensions that add more such registers; if libc were
built to use such a future ISA level, the asm we have now would be
unsafe. The i386 and x86_64 tlsdesc asm do not presently do anything
to save float/vector registers, and doing so would involve lots of
hwcap mess to figure out which ones are present. I think it would also
fail to be future-proof. Fortunately, i386 and x86_64 both provide
non-vector asm implementations of memcpy and memset, making it less
likely that any vector registers would be used in these code paths,
but still not impossible. It's also a hidden constraint, that things
only work because of the asm implementation details.

Unfortunately making a future-proof solution is really hard; this is a
consequence of the TLSDESC ABI and the way register file extensions
get done by cpu vendors.

One approach would be generating a fully-flattened version of
__tls_get_new for each arch that uses TLSDESC, via gcc -S, and
committing the output into the project as a source file.
Unfortunately, this involves atomic whose definitions vary by ISA
level on arm, so I think that makes it a no-go. Obviously it's also
really ugly.

Another approach is to depend on the compiler having flags that can be
used to build for a profile that only allows GPRs (no vector regs,
etc.), and building __tls_get_new as its own source file using these
flags. This is not the sort of tooling requirement I like, since it
abandons the principle of working with an arbitrary compiler with
minimal GNU C features.

The only approach I know that doesn't involve any tooling is having
the dynamic TLSDESC function raise a signal when it's missing the DTV
slot it needs. This delegates the responsibility for awareness of what
registers need saving to the kernel, which already must be aware in
order to perform context switching (you inherently can't run a binary
that uses new registers on an old kernel that's not aware of them).
This approach is nice in that it's entirely arch-agnostic, and works
for all present and future archs and ISA/register-file extensions. The
easy approach would just nab another SIGRTx as an
implementation-internal signal, so that all the asm would need to do
is a tkill syscall. Multiplexing on another signal should be possible
but makes for more complexity and I'm not sure there's any real
benefit.

My leaning is to go with the signal solution.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.