Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 9 Feb 2018 15:03:22 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: TLS storage offsets for TLS_ABOVE_TP

On Fri, Feb 09, 2018 at 06:07:25PM +0000, Nicholas Wilson wrote:
> Hi,
> 
> I have a question about the support for TLS_ABOVE_TP ("TLS variant
> I").
> 
> In archs like ARM, we have a matched pair of functions TP_ADJ and
> __pthread_self, which adjust a pthread* to the thread-register and
> back again. For ARM, there's an additional offset of 8 (and 16 for
> AArch64), which is part of the ABI to ensure a) the TP points to the
> DTV, and b) the main module's TLS block is at a known offset from
> the TP.
> 
> However, the ARM adjustment code uses "TP = pthread* +
> sizeof(pthread) - 8". That's correct for the arch ABI, in that the
> linker requires that the thread storage be located 8 bytes above the
> TP, and Musl does indeed store the TLS block there right after the
> struct pthread.
> 
> But what's odd is that you have the pthread->dtv_copy member right
> at the end of the pthread struct. So the thread pointer is pointing
> not to pthread->dtv_copy, but actually to pthread->canary_at_end.

dtv_copy at the end is just for internal use by ASM that needs to be
able to find the dtv pointer. I don't think there's any code on ARM
that uses it, but some archs might.

> Is my mental compiler going wrong? I don't have an ARM machine to
> actually execute the code on, but I've just been staring at it for
> an hour and can't work it out.
> 
> One thing I have noticed is that in all the four TLS models (general
> dynamic, local dynamic, initial exec, local exec), the DTV isn't
> actually dereferenced directly by compiler-generated code. The whole
> thing about having the DTV available to the compiler in a known
> location in fact is not used! So maybe that's how you get away with.

This is exactly right. There is no valid way the compiler can generate
code to access the DTV because it doesn't know the generation counter
to compare against (on glibc). Drepper's TLS ABI document muddled a
lot of things like this which are actually implementation details
because he was documenting his implementation rather than the
compiler-linker, linker-ldso, etc. ABI boundaries. This was an
important observation at the time I implemented TLS in musl. Since
access to the DTV is not actually ABI, its form is not fixed but an
implementation detail, and we used a form that omits the glibc
generation counters since we don't unload modules.

> For the sake of "correctness" and conformance though, I wonder if
> there should be a final "void *dtv_pad" member at the end of struct
> pthread, so that the DTV block at the end of the struct pthread has
> the right size for the platform.

No, this would shift canary_at_end back, breaking the ABI for it --
and the canary-at-end *is* ABI on the archs that use it. (If in the
future there are multiple incompatible places the canary could be
across different archs, we'll have to adjust this section with
preprocessor conditionals.) Alternatively we might be able to adjust
the shift of struct __pthread relative to the TP per-arch.

> I'd be happy to hear I'm wrong! (Maybe a diagram in the source code
> would help comprehension, to show how the memory is laid out,
> including the alignment bits and pieces. The FreeBSD libc source
> code has a helpful bit of ASCII art that draws the layout of the
> various bits of data and the alignment blocks between them.)

It's not so much that you're wrong as that the TLS document is
misleading.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.