Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Thu, 9 Feb 2012 21:58:25 -0500
From: Rich Felker <>
Subject: tough choice on thread pointer initialization issue

hi all,

due to some recent changes in musl, i've detected a long-standing bug
that went unnoticed until now. the way musl's pthread implementation
works, the thread pointer register (and kernel-side support) for the
initial thread is subject to "lazy initialization" - no syscalls are
made to set it up until the first time it's used. this allows us to
keep an absolutely minimal set of syscalls for super-short/trivial
programs, which looks really impressive in the libc comparison charts
and strace runs. however...

the bug i've found occurs when the thread pointer happens to get
initialized in code that runs from a signal handler. this is a rare
situation that will only happen if the program is avoiding
async-signal-unsafe functions in the main flow of execution so that
it's free to use them in a signal handler, but despite being rare,
it's perfectly legal, and right now musl crashes on such programs, for

#include <signal.h>
#include <stdio.h>
#include <pthread.h>
void evil(int sig)
	printf("hello, world %lx\n", (unsigned long)pthread_self());
int main(void)
	signal(SIGALRM, evil);

the issue is that when a signal handler returns, all registers,
including the thread-pointer registers (%gs or %fs on x86 or x86_64)
are reset to the values they had in the code the signal interrupted.
thus, musl thinks the thread pointer is valid at this point, but it's
actually null.

i see 3 possible fixes, none of which are ideal:

approach 1: hack the signal-return "restore" function to save the
current thread register value into the struct sigcontext before
calling SYS_sigreturn, so that it will be preserved when the
interrupted code resumes.

pros: minimal costs, never adds any syscalls versus current musl.

cons: ugly hack, and gdb does not like non-canonical sigreturn
functions (it refuses to work when the instruction pointer is at

approach 2: call pthread_self() from sigaction(). this will ensure
that a signal handler never runs prior to the thread pointer being

pros: minimal code changes, and avoids adding syscalls except for
programs that use signals but not threads.

cons: adds a syscall, and links unnecessary thread code when static
linking, in any program that uses signal handlers.

approach 3: always initialize the thread pointer from
__libc_start_main (befoe main runs). (this is the glibc approach)

pros: simple, and allows all the lazy-initialization logic to be
removed, moderately debloating and speeding up lots of thread-related
functions that will be able to use the thread pointer without making a
function call that checks whether it's initialized. would also make it
easier for us to support stack-protector, vsyscall/sysenter syscalls,
and thread-local storage in the future.

cons: constant additional 2-syscall overhead at startup (but it could
be optimized out when static-linking programs that don't use any
thread-related functions). their run times are ~1010ns and ~890ns on
my machine, compared to ~260000ns for the exec syscall. one other
possible issue is that we'd need to worry about making sure
non-threaded programs which otherwise would work on old kernels
without thread support don't crash due to assuming the thread pointer
is valid in places where they shouldn't need it.

before i make a decision, i'd like to hear if anyone from the
community has strong opinions one way or the other. i've almost ruled
out approach #1 and i'm leaning towards #3, with the idea that
simplicity is worth more than a couple trivial syscalls.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.