Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sat, 26 May 2018 20:34:30 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: TLS issue on aarch64

On Sat, May 26, 2018 at 02:54:16AM +0200, Szabolcs Nagy wrote:
> * Phillip Berndt <phillip.berndt@...glemail.com> [2018-05-26 00:20:04 +0200]:
> > 2018-05-25 16:50 GMT+02:00 Szabolcs Nagy <nsz@...t70.net>:
> > > i think the constraints for tp are:
> > >
> > > - tp must be aligned to 'tls_align'
> > >
> > > - tp must be at a small fixed offset from the end
> > > of pthread struct (so asm code can access the dtv)
> > >
> > > - tp + off must be usable memory for tls for off >= 16
> > > (this is aarch64 specific)
> > >
> > 
> > Hmm.. but these constraints do not explain the extra offset of one
> > alignment I'm seeing in the GCC output, do they? If I compile a
> 
> tp must be aligned and tp + offset must be aligned too,
> but offset >= 16 has to hold.
> 
> > program with a single TLS variable with
> > __attribute__((aligned(n)) that does nothing but try to reference and
> > print said variable, I get the
> > following assembler code from GCC:
> > 
> > For n = 0x1000:
> > 
> >   400194:       d53bd041        mrs     x1, tpidr_el0
> >   400198:       b0000040        adrp    x0, 409000 <__subtf3+0xd18>
> >   40019c:       91400421        add     x1, x1, #0x1, lsl #12
> >   4001a0:       91000021        add     x1, x1, #0x0
> > 
> > 
> > For n = 0x100:
> > 
> >   400194:       d53bd041        mrs     x1, tpidr_el0
> >   400198:       b0000040        adrp    x0, 409000 <__subtf3+0xd18>
> >   40019c:       91400021        add     x1, x1, #0x0, lsl #12
> >   4001a0:       91040021        add     x1, x1, #0x100
> > 
> > For n = 0x10:
> > 
> >   400194:       d53bd041        mrs     x1, tpidr_el0
> >   400198:       b0000040        adrp    x0, 409000 <__subtf3+0xd18>
> >   40019c:       91400021        add     x1, x1, #0x0, lsl #12
> >   4001a0:       91004021        add     x1, x1, #0x10
> > 
> > That's how I came up with the mem += libc.tls_align hack in the first place.
> > 
> 
> indeed you need another alignment there, i came up with the
> following fix:
> 
> (on mips/ppc i expect it not to change anything: tp is
> at a page aligned offset from the end of struct pthread,
> so one alignment is enough there, but on aarch64/arm/sh4
> this makes a difference, and seems to pass my simple tests)
> 
> diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c
> index 1c5d98a0..8e70024d 100644
> --- a/src/env/__init_tls.c
> +++ b/src/env/__init_tls.c
> @@ -41,9 +41,12 @@ void *__copy_tls(unsigned char *mem)
>  #ifdef TLS_ABOVE_TP
>  	dtv = (void **)(mem + libc.tls_size) - (libc.tls_cnt + 1);
>  
> -	mem += -((uintptr_t)mem + sizeof(struct pthread)) & (libc.tls_align-1);
> +	/* Ensure TP is aligned.  */
> +	mem += -(uintptr_t)TP_ADJ(mem) & (libc.tls_align-1);
>  	td = (pthread_t)mem;
>  	mem += sizeof(struct pthread);
> +	/* Ensure TLS is aligned after struct pthread.  */
> +	mem += -(uintptr_t)mem & (libc.tls_align-1);
>  
>  	for (i=1, p=libc.tls_head; p; i++, p=p->next) {
>  		dtv[i] = mem + p->offset;

As written this (or anything using libc.tls_align to adjust offset of
the TLS from the TP) is not valid. The value of libc.tls_align is
runtime-variable and will increase upon dlopen, and even without
dlopen, will be non-deterministic dependent on shared libraries from
DT_NEEDED in dynamic-linked programs. The offset between TP and TLS is
a property of the linker's handling of local-exec TLS in the main
program only, and thus probably should be using libc.tls_head.align.

However, care needs to be taken that libc.tls_head may initially be
null if the main program has no TLS, but could later become non-null
due to dlopen. If the offset between TP and TLS changed due to this,
any initial-exec-model TLS access would be wrong. Fortunately such a
program cannot have initial-exec-model accesses (initial-exec is only
valid for TLS that existed at program start), so we can probably just
ignore the issue and always use libc.tls_head?libc.tls_head.align:1;
this will cause gratuitous padding for threads created after dlopen of
a library with larger alignment, but should otherwise not hurt
anything.

Rich

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ