Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 Sep 2014 20:05:16 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Re: A running list of questions from "porting" Slackware
 to musl

On Tue, Sep 30, 2014 at 04:50:28PM -0700, Andy Lutomirski wrote:
> > When gcc generates the canary-check code, on failure it normally
> > calls/jumps to __stack_chk_fail. But for shared libraries, that call
> > would go to a thunk in the library's PLT, which depends on the GOT
> > register being initialized (actually this varies by arch; x86_64
> > doesn't need it). In order to avoid (expensive) loading of the GOT
> > register in every function just as a contingency in case
> > __stack_chk_fail needs to be called, for position-independent code GCC
> > generates a call to __stack_chk_fail_local instead. This is a hidden
> > function (and necessarily exists within the same .so) so the call
> > doesn't have to go through the PLT; it's just a straight relative
> > call/jump instruction. __stack_chk_fail_local is then responsible for
> > loading the GOT register and calling __stack_chk_fail.
> 
> [slightly off topic]
> 
> Does GCC even know how to call through the GOT instead of the PLT?
> Windows (at least 32-bit Windows) has done for decades, at least if
> dllimport is set.
> 
> On x86_64, this would be call *whatever@...off(%rip) instead of call
> whatever@....

This precludes optimizing out the indirection at link time (or at
least it requires more complex transformation in the linker). I'm not
sure if there are cases where GCC generates this kind of code or not.
It's also not practical on many ISAs.

> (Even better: the loader could patch the PLT with a direct jump.  Could
> musl do this?  At least in the case where the symbol is within 2G of the
> PLT entry,

This is really not a good idea. The old PowerPC ABI did this, and musl
does not support it (it requires the new "secure-plt" mode). Hardened
kernels have various restrictions on modifying executable pages, up to
and including completely forbidding this kind of usage. And even if
it's not forbidden, it's going to use more memory due to an additional
page (or more) per shared library that's not going to be sharable.
Also it requires complex per-arch code (minimal machine code
generation, instruction cache flushing/barriers, etc.).

> this should be straightforward if no threads have been
> started yet.

Threads having been started or not are not relevant. The newly loaded
code is not visible until dlopen returns, so nothing can race with
modifications to it.

> If musl did this, it could advertise a nice speedup over
> glibc...)

I think the performance gain would be mostly theoretical. Do you have
any timings that show otherwise?

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.