Date: Fri, 1 Jun 2018 02:11:02 +0200 From: Szabolcs Nagy <nsz@...t70.net> To: musl@...ts.openwall.com Subject: Re: TLS issue on aarch64 * Szabolcs Nagy <nsz@...t70.net> [2018-05-29 08:33:17 +0200]: > * Rich Felker <dalias@...c.org> [2018-05-28 18:15:21 -0400]: > > On Mon, May 28, 2018 at 10:47:31PM +0200, Szabolcs Nagy wrote: > > > another issue with the patch is that if tp is aligned then pthread_t > > > may not get aligned: > > > > > > tp == self + sizeof(pthread_t) - reserved > > > > This expression does not look correct; it would have tp point > > somewhere inside of the pthread structure rather than just past the > > end of it. > > > > this is the current tp setup on > aarch64, arm and sh4, see TP_ADJ > > we can just make tp = self+sizeof(pthread_t) > but then there will be a small unused hole > > > Maybe your code is doing this to avoid wasted padding, but if so I > > think that's a bad idea. It violates the invariant that the pthread > > structure is at a constant offset from tp, which is needed for > > efficient pthread_self and for access to fixed slots at the end of it > > (like canary or dtv copy). > > > > > so sizeof(pthread_t) - reserved must be divisible with > > > gcd(alignment of tp, alignof(pthread_t)) to be able to make both > > > self and tp aligned. > > > > > > this is not an issue on current targets with current pthread_t, > > > but we may want to decouple internal struct pthread alignment > > > details and the abi reserved tls size, i.e. tp_adj could be like > > > > > > tp == alignup(self + sizeof(pthread_t) - reserved, alignof(pthread_t)) > > > > > > or we add a static assert that reserved and alignof(pthread_t) > > > are not conflicting. > > > > Maybe I'm misunderstanding what "reserved" is, since you're talking > > about a static assert...? > > > > it is the abi reserved space after tp > (the bfd linker calls it TCB_SIZE) did some digging into the bfd linker code, i dump here my understanding: tls variant 1 has two sub variants: ppc (mips is the same) and aarch64 (arm, sh are the same just s/16/8/ for the reserved space) base: tls segment base address in the elf obj align: tls segment alignment requirement in the elf obj addr: address of a tls var in the tls segment of the elf obj tp: thread pointer at runtime ptr: address of tls var at runtime local-exec (off: offset from tp in the code): code: ptr = tp + off ppc: off = addr - (base + 0x7000) aarch64: off = addr - (base - alignup(16, align)) initial-exec (got,add: REL_TPOFF got entry and value/addend): code: ptr = tp + *got ppc: add = addr - base aarch64: add = addr - base (*got is libc internal, but it should be setup like *got = module_tls_base_ptr - tp + add) local-dynamic (non-tlsdesc, got,add: REL_DTPOFF got entry and value, off: offset in the code): code: ptr = __tls_get_addr(got) + off ppc: add = 0, off = addr - (base + 0x8000) aarch64: add = addr - base, off = 0 (__tls_getaddr(got,modid) = dtv[modid] + *got + off +..., libc internal, e.g. dtv[modid] could point at a fixed offset from the allocated tls data for the module) general-dynamic (non-tlsdesc, got,add: REL_DTPOFF got entry and value): code: ptr = __tls_get_addr(got) (= dtv[modid] + *got) ppc: add = addr - base aarch64: add = addr - base ptr has correct alignment if the tls segment is mapped to an aligned address, i.e. ptr - (addr - base) must be aligned, working backwards from this the requirement is for local-exec to work: ppc: tp - 0x7000 must be aligned aarch64: tp + alignup(16, align) must be aligned == tp must be aligned for initial-exec to work: tp + *got - add must be aligned (i.e. *got has to be set up to meet the alignment requirement of the module, this does not seem to require realignment of tp so even runtime loading of initial-exec tls should be possible assuming there is enough space etc...) for local-dynamic to work: dtv[modid] must be aligned does this make sense? i can come up with a new patch, but it's not clear if on aarch64 we want to allow tp to point inside pthread_t or strictly point at the end (then the 16 reserved bytes are unused), note that in glibc the 16 byte after tp contains dtv and an unused ptr and it is expected that additional magic tls data the compiler needs to access (e.g. stackend for split stack support or stack canary) comes before tls (so the layout is pthread_t, some magic tls data, tp, 16 byte with dtv, tls), for us only the dtv is important to access with fixed offset from tp (even that's not necessary if we generate the load offset in the tlsdesc asm based on sizeof(pthread_t) etc) i haven't looked into how gdb tries to handle tls (without threaddb) there might be further abi requirements for gdb to work well.
Powered by blists - more mailing lists
Powered by Openwall GNU/*/Linux - Powered by OpenVZ