Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Wed, 17 Oct 2012 01:39:49 +0200
From: boris brezillon <b.brezillon.musl@...il.com>
To: musl@...ts.openwall.com
Subject: Re: TLS (thread-local storage) support

2012/10/17 Rich Felker <dalias@...ifal.cx>:
> On Tue, Oct 16, 2012 at 11:47:52PM +0200, boris brezillon wrote:
>> 2012/10/16 boris brezillon <b.brezillon.musl@...il.com>:
>> > Hi,
>> >
>> > First I'd like to thank Rich for adding TLS support (I started to work
>> > on it a few weeks ago but never had time to finish it).
>> >
>> > 2012/10/6 Daniel Cegiełka <daniel.cegielka@...il.com>:
>> >> 2012/10/5 Rich Felker <dalias@...ifal.cx>:
>> >>> On Thu, Oct 04, 2012 at 11:29:11PM +0200, Daniel Cegiełka wrote:
>> >>>> great news! Finally able to compile Go (lang)...
>> >>>
>> >>> Did Go fail with gcc's emulated TLS in libgcc?
>> >>
>> >> I tested Go with sabotage (with fresh musl). I'll try to do it again...
>> >> gcc in sabotage was compiled without support for TLS, so I didn't
>> >> expect that it will be successful:
>> >>
>> >> https://github.com/rofl0r/sabotage/blob/master/pkg/gcc4
>> >>
>> > There's at least one thing (maybe more) missing for go support with
>> > musl : gcc 'split-stack' support (see http://blog.nella.org/?p=849 and
>> > http://gcc.gnu.org/wiki/SplitStacks).
>> >
>> > I'm also interested in split stack support in musl but for other
>> > reasons (thread and coroutine stack automatic expansion).
>> >
>> > For x86/x86_64 split stack is implemented using a field inside the
>> > pthread struct which is accessed via %fs (or %gs for x86_64) and an
>> > offset.
>> >
>> > Currently this offset is defined at 0x30 (0x70 for x86_64) by the
>> > TARGET_THREAD_SPLIT_STACK_OFFSET but only if TARGET_LIBC_PROVIDES_SSP
>> > is defined (see gcc/config/i386/gnu-user.h or
>> > gcc/config/i386/gnu-user64.h).
>> >
>> > As far as I know musl does not support stack protection, but we could
>> > at least patch gcc to define TARGET_THREAD_SPLIT_STACK_OFFSET when
>> > using musl.
>> >
>> > We also need to reserve a field in the musl pthread struct. There are
>> > currently two fields named 'unused1' and 'unused2' but I'm not sure
>> > they're really unused in every supported arch.
>> >
>> >
>> > BTW, I'd like to work on a more integrated support of split stack in MUSL :
>
> I'm not a fan of split-stack for various reasons, but I have no
> objection to adding support to make it work as long as it's an
> optional feature that does not impair non-split-stack usage.
>
>> > 1) support in dynamic linker (see the last point of
>> > http://gcc.gnu.org/wiki/SplitStacks) : check split stack notes in
>> > shared libs (and program ?)
>
> It could be done, but is it really useful? There are infinitely many
> ways you can crash a program with libraries that were not built
> correctly for use with it. Checking for one of them seems like
> gratuitous complexity with little benefit.
>
>> > 2) support in thread implementation : currently when a thread is
>> > created the stack limit is set afterward (see
>> > https://github.com/mirrors/gcc/blob/master/libgcc/generic-morestack-thread.c
>> > and https://github.com/mirrors/gcc/blob/master/libgcc/config/i386/morestack.S)
>> > and the stack size is supposed to be 16K (which is the minimum stack
>> > size). This means we may reallocate a new stack chunk even if the
>> > previous one (the first one) is not fully used.
>> > If stack limit is set by thread implementation, this can be set
>> > appropriately according to the stack size defined by the thread
>> > creator.
>
> That's perfectly reasonable to support.
>
>> > 3) more optimizations I haven't thought about yet...
>> >
>> 4) Compile musl with '-fsplit-stack' and add no_split_stack attribute
>> to appropriate functions (at least all functions called before
>> pthread_self_init because %gs or %fs register is unusable before this
>> call).
>
> This is definitely not desirable, at least not by default. It hurts
> performance, possibly a lot, and destroys async-signal-safety. Also I
> doubt it's needed. As long as split stack mode leaves at least ~8k
> when calling a new function, most if not all functions in musl should
> run fine without needing support for enlarging the stack.
I agree. This should be made optional. But if we don't compile libc
with fsplit-stack (-fnosplit-stack).
Each call to a libc func from an external func compiled with split
stack may lead to a 64K stack chunk alloc.
>
>> 5) set main thread stack limit to 0 (pthread_self_init) : the main
>> thread stack grow is handled by the kernel.
>>
>> 6) add no-split-stack note to every asm file.
>
> I'm against this, or any boilerplate clutter. If it's really needed,
> it should be possible with CFLAGS (or "ASFLAGS"), rather than
> modifying every file, and if there's no way to do it with command line
> options, that's a bug in gas.
Not supported in gas, already tried.
>
> With that said, why would it be needed? I don't think there are any
> asm files that use more than 32 bytes of stack...
Same reason as 4) : 64K stack chunk allocation.
>
>> 7) make split stack support optional (either by checking the
>> -fsplit-stack option in CFLAGS or with a specific option :
>> --enable-split-stack) : split stack adds overhead to every functions
>> (except for those with the 'no_split_stack' attribute).
>>
>> > Do you have any concern about adding those features in musl ?
>
> Basically, the whole idea of split-stack is antithetical to the QoI
> guarantees of musl. A program using split-stack can crash at any time
> due to out-of-memory, and there is no reliable/portable way to recover
> from this condition. It's much like the following low-quality aspects
> of glibc and default Linux config:
The same program may crash because of stack overflow (segfault) or
worst : corrupt memory.
At best the split stack provides a way to increase the thread without
crashing the whole process.
At worst it crash the program but never corrupt the memory.
>
> - overcommit
> - lazy allocation of libc-internal storage
> - lazy/on-demand allocation of TLS
> - dynamic loading of libgcc_s.so at runtime in pthread_cancel
> - etc.
>
> On 64-bit machines, split-stack is 100% useless. You can get the same
> behavior (crashing on OOM, but not having to know your stack size
> ahead of time) by just turning on overcommit and using huge thread
> stack sizes; the enormous 64-bit virtual address space makes it so you
> don't have to worry about running out of virtual memory.
>
> On 32-bit machines where virtual addresses are a precious resource,
> split-stack is a clever hack that essentially allows you to
> over-commit not just physical memory but virtual memory too. But it's
> inherently non-robust, and even worse than physical memory overcommit.
> At least in the latter case, the kernel can be intelligent about
> choosing an "abusive" process to kill. But if you run out of virtual
> memory, nothing can be done but terminating the whole process (you
> can't just terminate a single thread because it will leave resources
> in an inconsistent state).
>
> As such, I'm willing to add whatever inexpensive support framework is
> needed so that people who want to use split-stack can use it, but I'm
> very wary of invasive or costly changes to support a feature which I
> believe is fundamentally misguided (and, for 64-bit targets, utterly
> useless).

I understand.

>
> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.