Openwall GNU/*/Linux - a small security-enhanced Linux distro for servers
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Fri, 25 May 2018 14:40:14 +0200
From: Phillip Berndt <phillip.berndt@...glemail.com>
To: musl@...ts.openwall.com
Subject: TLS issue on aarch64

Hi,

I'm experiencing a TLS-related error with musl on aarch64. This is my
test program:

----8<--------------
#include <stdio.h>

__thread int foo = 1234;
__thread int bar __attribute__((aligned(0x100))) = 5678;

int main(int argc, char **argv)
{
    printf("0x%p: %d\n", &foo, foo);
    printf("0x%p: %d\n", &bar, bar);

    return 0;
}
---->8---------------

I'm compiling this into a static binary with -O2 -static. With glibc and
gcc 5.4.0 (tried with 7.2 as well), this gives me the expected output. With
musl libc 1.1.19, I instead see

----8<--------------
0x0x7fa7adf2f0: 0
0x0x7fa7adf3f0: 0
---->8---------------

Note that this is the wrong address (not aligned), and that the memory has
unexpected content as well.

I did some initial debugging, but now I'm stuck and need some help. What I've
found so far:

* GCC apparently emits code that expects the tpidr_el0 register to contain a
  pointer to the TLS memory, and it expects that the loader unconditionally
  offsets the first variable by the TLS alignment into said memory:

  Disassembly of the code that loads &foo:
  ----8<--------------
  4001a4:       d53bd053        mrs     x19, tpidr_el0
  4001a8:       91400273        add     x19, x19, #0x0, lsl #12
  4001ac:       91040273        add     x19, x19, #0x100
  ----8<--------------

  (If I align the variable by 0x1000 instead then the code changes
   acoordingly.)

* Musl, on the other hand, in __copy_tls, initializes tpidr_el0 with a
  pointer 16 bytes from the end of struct pthread, and copies the TLS
  initializer code directly behind that struct, without adding extra
  padding.

Hence the code tries to access the TLS variables at the wrong location.

The following patch fixes the issue, but only if musl is then compiled with
optimizations off. With optimizations, the compiler emits the *same* code for
both variants. Also, the patch probably has some unexpected side-effects, too -
I'm just adding it here as a starting point for further debugging.

Any help is greatly appreciated :-)

- Phillip

----

diff --git a/arch/aarch64/pthread_arch.h b/arch/aarch64/pthread_arch.h
index b2e2d8f..c69f6f1 100644
--- a/arch/aarch64/pthread_arch.h
+++ b/arch/aarch64/pthread_arch.h
@@ -2,10 +2,10 @@ static inline struct pthread *__pthread_self()
 {
        char *self;
        __asm__ __volatile__ ("mrs %0,tpidr_el0" : "=r"(self));
-       return (void*)(self + 16 - sizeof(struct pthread));
+       return (void*)(self - sizeof(struct pthread));
 }

 #define TLS_ABOVE_TP
-#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread) - 16)
+#define TP_ADJ(p) ((char *)(p) + sizeof(struct pthread))

 #define MC_PC pc
diff --git a/src/env/__init_tls.c b/src/env/__init_tls.c
index b125eb1..3a3c307 100644
--- a/src/env/__init_tls.c
+++ b/src/env/__init_tls.c
@@ -42,7 +42,7 @@ void *__copy_tls(unsigned char *mem)

        mem += -((uintptr_t)mem + sizeof(struct pthread)) & (libc.tls_align-1);
        td = (pthread_t)mem;
-       mem += sizeof(struct pthread);
+       mem += sizeof(struct pthread) + libc.tls_align;

        for (i=1, p=libc.tls_head; p; i++, p=p->next) {
                dtv[i] = mem + p->offset;

Powered by blists - more mailing lists

Your e-mail address:

Powered by Openwall GNU/*/Linux - Powered by OpenVZ