Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 1 Jun 2018 02:11:02 +0200
From: Szabolcs Nagy <>
Subject: Re: TLS issue on aarch64

* Szabolcs Nagy <> [2018-05-29 08:33:17 +0200]:
> * Rich Felker <> [2018-05-28 18:15:21 -0400]:
> > On Mon, May 28, 2018 at 10:47:31PM +0200, Szabolcs Nagy wrote:
> > > another issue with the patch is that if tp is aligned then pthread_t
> > > may not get aligned:
> > > 
> > > tp == self + sizeof(pthread_t) - reserved
> > 
> > This expression does not look correct; it would have tp point
> > somewhere inside of the pthread structure rather than just past the
> > end of it.
> > 
> this is the current tp setup on
> aarch64, arm and sh4, see TP_ADJ
> we can just make tp = self+sizeof(pthread_t)
> but then there will be a small unused hole
> > Maybe your code is doing this to avoid wasted padding, but if so I
> > think that's a bad idea. It violates the invariant that the pthread
> > structure is at a constant offset from tp, which is needed for
> > efficient pthread_self and for access to fixed slots at the end of it
> > (like canary or dtv copy).
> > 
> > > so sizeof(pthread_t) - reserved must be divisible with
> > > gcd(alignment of tp, alignof(pthread_t)) to be able to make both
> > > self and tp aligned.
> > > 
> > > this is not an issue on current targets with current pthread_t,
> > > but we may want to decouple internal struct pthread alignment
> > > details and the abi reserved tls size, i.e. tp_adj could be like
> > > 
> > > tp == alignup(self + sizeof(pthread_t) - reserved, alignof(pthread_t))
> > > 
> > > or we add a static assert that reserved and alignof(pthread_t)
> > > are not conflicting.
> > 
> > Maybe I'm misunderstanding what "reserved" is, since you're talking
> > about a static assert...?
> > 
> it is the abi reserved space after tp
> (the bfd linker calls it TCB_SIZE)

did some digging into the bfd linker code, i dump here my understanding:

tls variant 1 has two sub variants: ppc (mips is the same) and
aarch64 (arm, sh are the same just s/16/8/ for the reserved space)

base: tls segment base address in the elf obj
align: tls segment alignment requirement in the elf obj
addr: address of a tls var in the tls segment of the elf obj
tp: thread pointer at runtime
ptr: address of tls var at runtime

local-exec (off: offset from tp in the code):
	code: ptr = tp + off

	ppc: off = addr - (base + 0x7000)
	aarch64: off = addr - (base - alignup(16, align))

initial-exec (got,add: REL_TPOFF got entry and value/addend):
	code: ptr = tp + *got

	ppc: add = addr - base
	aarch64: add = addr - base

	(*got is libc internal, but it should be setup like
	*got = module_tls_base_ptr - tp + add)

local-dynamic (non-tlsdesc, got,add: REL_DTPOFF got entry and value, off: offset in the code):
	code: ptr = __tls_get_addr(got) + off

	ppc: add = 0, off = addr - (base + 0x8000)
	aarch64: add = addr - base, off = 0

	(__tls_getaddr(got,modid) = dtv[modid] + *got + off +..., libc internal,
	e.g. dtv[modid] could point at a fixed offset from the allocated tls
	data for the module)

general-dynamic (non-tlsdesc, got,add: REL_DTPOFF got entry and value):
	code: ptr = __tls_get_addr(got) (= dtv[modid] + *got)

	ppc: add = addr - base
	aarch64: add = addr - base

ptr has correct alignment if the tls segment is mapped to an aligned address,
i.e. ptr - (addr - base) must be aligned, working backwards from this
the requirement is

for local-exec to work:
	ppc: tp - 0x7000 must be aligned
	aarch64: tp + alignup(16, align) must be aligned == tp must be aligned

for initial-exec to work:
	tp + *got - add must be aligned (i.e. *got has to be set up to meet
	the alignment requirement of the module, this does not seem to require
	realignment of tp so even runtime loading of initial-exec tls should
	be possible assuming there is enough space etc...)

for local-dynamic to work: dtv[modid] must be aligned

does this make sense?

i can come up with a new patch, but it's not clear if on aarch64 we want
to allow tp to point inside pthread_t or strictly point at the end (then
the 16 reserved bytes are unused), note that in glibc the 16 byte after tp
contains dtv and an unused ptr and it is expected that additional magic tls
data the compiler needs to access (e.g. stackend for split stack support or
stack canary) comes before tls (so the layout is pthread_t, some magic tls
data, tp, 16 byte with dtv, tls), for us only the dtv is important to access
with fixed offset from tp (even that's not necessary if we generate the load
offset in the tlsdesc asm based on sizeof(pthread_t) etc)

i haven't looked into how gdb tries to handle tls (without threaddb)
there might be further abi requirements for gdb to work well.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.