|
Message-ID: <20181010023546.GM17110@brightrain.aerifal.cx> Date: Tue, 9 Oct 2018 22:35:46 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: TLSDESC register-preserving mess On Tue, Oct 09, 2018 at 09:26:20PM -0400, Rich Felker wrote: > I've run across a bit of a problem in how the TLSDESC calling > conventions work. In the case where the needed DTV slot is not yet > filled in for the calling thread, the dynamic TLSDESC function needs > to call into C code that obtains the memory that was previously > reserved for it, initializes it (involving memcpy/memset), and fills > in the DTV entry for it. This requires saving and restoring any > call-clobbered registers that might be used by C code. > > Because the operation involves memcpy/memset, it's not just > theoretically possible but likely that vector registers could be used. > As written, the aarch64 and arm asm save and restore float/vector > registers around the call, but I don't think they're future-proof > against ISA extensions that add more such registers; if libc were > built to use such a future ISA level, the asm we have now would be > unsafe. The i386 and x86_64 tlsdesc asm do not presently do anything > to save float/vector registers, and doing so would involve lots of > hwcap mess to figure out which ones are present. I think it would also > fail to be future-proof. Fortunately, i386 and x86_64 both provide > non-vector asm implementations of memcpy and memset, making it less > likely that any vector registers would be used in these code paths, > but still not impossible. It's also a hidden constraint, that things > only work because of the asm implementation details. > > Unfortunately making a future-proof solution is really hard; this is a > consequence of the TLSDESC ABI and the way register file extensions > get done by cpu vendors. > > One approach would be generating a fully-flattened version of > __tls_get_new for each arch that uses TLSDESC, via gcc -S, and > committing the output into the project as a source file. > Unfortunately, this involves atomic whose definitions vary by ISA > level on arm, so I think that makes it a no-go. Obviously it's also > really ugly. > > Another approach is to depend on the compiler having flags that can be > used to build for a profile that only allows GPRs (no vector regs, > etc.), and building __tls_get_new as its own source file using these > flags. This is not the sort of tooling requirement I like, since it > abandons the principle of working with an arbitrary compiler with > minimal GNU C features. > > The only approach I know that doesn't involve any tooling is having > the dynamic TLSDESC function raise a signal when it's missing the DTV > slot it needs. This delegates the responsibility for awareness of what > registers need saving to the kernel, which already must be aware in > order to perform context switching (you inherently can't run a binary > that uses new registers on an old kernel that's not aware of them). > This approach is nice in that it's entirely arch-agnostic, and works > for all present and future archs and ISA/register-file extensions. The > easy approach would just nab another SIGRTx as an > implementation-internal signal, so that all the asm would need to do > is a tkill syscall. Multiplexing on another signal should be possible > but makes for more complexity and I'm not sure there's any real > benefit. > > My leaning is to go with the signal solution. An alternate approach being proposed on #musl that I might like better is getting rid of __tls_get_new entirely, having the DTV for all existing threads updated at dlopen time. This requires either a __synccall with no failure path (which we don't have) or adding a linked list of threads. The non-__synccall approach also requires the SYS_membarrier syscall (Linux 4.3) and emulation of it as a fallback (which can be done via signals if you have a list of threads). Aside from solving the tlsdesc clobber issue, what I like about this approach is that it removes all branches from __tls_get_addr and the dynamic tlsdesc function; they just *always succeed in the hot path*. It also makes it easier to facilitate recovery of memory allocated for dynamic TLS if we want to -- it no longer has to be a shared block doled out to threads via a_fetch_add, so each thread could get its own malloc and then be able to free it at exit. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.