Date: Thu, 31 May 2018 20:52:00 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: TLS issue on aarch64 On Fri, Jun 01, 2018 at 02:11:02AM +0200, Szabolcs Nagy wrote: > * Szabolcs Nagy <nsz@...t70.net> [2018-05-29 08:33:17 +0200]: > > * Rich Felker <dalias@...c.org> [2018-05-28 18:15:21 -0400]: > > > On Mon, May 28, 2018 at 10:47:31PM +0200, Szabolcs Nagy wrote: > > > > another issue with the patch is that if tp is aligned then pthread_t > > > > may not get aligned: > > > > > > > > tp == self + sizeof(pthread_t) - reserved > > > > > > This expression does not look correct; it would have tp point > > > somewhere inside of the pthread structure rather than just past the > > > end of it. > > > > > > > this is the current tp setup on > > aarch64, arm and sh4, see TP_ADJ > > > > we can just make tp = self+sizeof(pthread_t) > > but then there will be a small unused hole > > > > > Maybe your code is doing this to avoid wasted padding, but if so I > > > think that's a bad idea. It violates the invariant that the pthread > > > structure is at a constant offset from tp, which is needed for > > > efficient pthread_self and for access to fixed slots at the end of it > > > (like canary or dtv copy). > > > > > > > so sizeof(pthread_t) - reserved must be divisible with > > > > gcd(alignment of tp, alignof(pthread_t)) to be able to make both > > > > self and tp aligned. > > > > > > > > this is not an issue on current targets with current pthread_t, > > > > but we may want to decouple internal struct pthread alignment > > > > details and the abi reserved tls size, i.e. tp_adj could be like > > > > > > > > tp == alignup(self + sizeof(pthread_t) - reserved, alignof(pthread_t)) > > > > > > > > or we add a static assert that reserved and alignof(pthread_t) > > > > are not conflicting. > > > > > > Maybe I'm misunderstanding what "reserved" is, since you're talking > > > about a static assert...? > > > > > > > it is the abi reserved space after tp > > (the bfd linker calls it TCB_SIZE) > > did some digging into the bfd linker code, i dump here my understanding: > > tls variant 1 has two sub variants: ppc (mips is the same) and > aarch64 (arm, sh are the same just s/16/8/ for the reserved space) This is very helpful. > base: tls segment base address in the elf obj > align: tls segment alignment requirement in the elf obj > addr: address of a tls var in the tls segment of the elf obj > tp: thread pointer at runtime > ptr: address of tls var at runtime > > local-exec (off: offset from tp in the code): > code: ptr = tp + off > > ppc: off = addr - (base + 0x7000) > aarch64: off = addr - (base - alignup(16, align)) > > initial-exec (got,add: REL_TPOFF got entry and value/addend): > [...] > local-dynamic (non-tlsdesc, got,add: REL_DTPOFF got entry and value, off: offset in the code): > [...] > general-dynamic (non-tlsdesc, got,add: REL_DTPOFF got entry and value): > [...] None but local-exec are really linker ABI, except kinda in the case of static linking if the linker fails to perform relaxations. A more precise way to say this would be that only TLS which is accessible via local-exec model has any constraints imposed on it by the linker as to where it lies relative to TP. For initial-exec, the address relative to TP must be constant across all threads, but it's not set by the linker (except in the case of failed relaxation, in which case it's just the local-exec model with the offset stored out-of-band). For other models, there are no constraints at all; it's entirely up to the dynamic linker implementation how it arranges for things to work. > ptr has correct alignment if the tls segment is mapped to an aligned address, > i.e. ptr - (addr - base) must be aligned, working backwards from this > the requirement is > > for local-exec to work: > ppc: tp - 0x7000 must be aligned This should be fine, since musl arranges for self+sizeof(struct pthread) to be aligned, and that's exactly what tp-0x7000 is. > aarch64: tp + alignup(16, align) must be aligned == tp must be aligned OK, I see two possible solutions here: 1. tp==self+sizeof(struct pthread). In this case we'll waste some space (vs the current approach) when no extra alignment is needed, but it's simple and clean because all the alignments match up naturally. 2. tp==self+sizeof(struct pthread)-16 (or rather -reserved in general). This preserves the current memory usage, but requires complex new alignment logic since self will no longer be aligned mod tls_align when tls_align>reserved. I pretty strongly prefer option 1. In either case, the main_tls.offset/app.tls.offset value needs to correctly reflect the offset of the TLS from TP, so it either needs to be alignup(reserved,tls_align) or alignup(reserved,tls_align)-reserved depending on option 1 or 2. After that change is made, we need to make sure the storage needs (libc.tls_size) are computed correctly and account for the extra space due to the initial positive offset. No change is then needed in __copy_tls. Changes to TP_ADJ and __pthread_self are needed to get reserved out of them, and the value of reserved needs to be provided somewhere else for computing main_tls.offset. > for initial-exec to work: > tp + *got - add must be aligned (i.e. *got has to be set up to meet > the alignment requirement of the module, this does not seem to require > realignment of tp so even runtime loading of initial-exec tls should > be possible assuming there is enough space etc...) There's never space so it's not even a question, but even if there were, no, it can't be done because tp will not be aligned mod some possibly-larger alignment than the alignment in effect at the time the thread was created. > i can come up with a new patch, but it's not clear if on aarch64 we want > to allow tp to point inside pthread_t or strictly point at the end (then > the 16 reserved bytes are unused), note that in glibc the 16 byte after tp > contains dtv and an unused ptr and it is expected that additional magic tls > data the compiler needs to access (e.g. stackend for split stack support or > stack canary) comes before tls (so the layout is pthread_t, some magic tls > data, tp, 16 byte with dtv, tls), for us only the dtv is important to access > with fixed offset from tp (even that's not necessary if we generate the load > offset in the tlsdesc asm based on sizeof(pthread_t) etc) > > i haven't looked into how gdb tries to handle tls (without threaddb) > there might be further abi requirements for gdb to work well. I don't think gdb is working reasonably at all right now for accessing TLS on musl. It really should just be enhanced to inject calls to __tls_get_addr and/or dlsym. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.