Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 30 Sep 2014 22:49:15 -0700
From: Andy Lutomirski <luto@...capital.net>
To: musl@...ts.openwall.com
Subject: Re: A running list of questions from "porting" Slackware to musl

On 09/30/2014 05:05 PM, Rich Felker wrote:
> On Tue, Sep 30, 2014 at 04:50:28PM -0700, Andy Lutomirski wrote:
>>> When gcc generates the canary-check code, on failure it normally
>>> calls/jumps to __stack_chk_fail. But for shared libraries, that call
>>> would go to a thunk in the library's PLT, which depends on the GOT
>>> register being initialized (actually this varies by arch; x86_64
>>> doesn't need it). In order to avoid (expensive) loading of the GOT
>>> register in every function just as a contingency in case
>>> __stack_chk_fail needs to be called, for position-independent code GCC
>>> generates a call to __stack_chk_fail_local instead. This is a hidden
>>> function (and necessarily exists within the same .so) so the call
>>> doesn't have to go through the PLT; it's just a straight relative
>>> call/jump instruction. __stack_chk_fail_local is then responsible for
>>> loading the GOT register and calling __stack_chk_fail.
>>
>> [slightly off topic]
>>
>> Does GCC even know how to call through the GOT instead of the PLT?
>> Windows (at least 32-bit Windows) has done for decades, at least if
>> dllimport is set.
>>
>> On x86_64, this would be call *whatever@...off(%rip) instead of call
>> whatever@....
>
> This precludes optimizing out the indirection at link time (or at
> least it requires more complex transformation in the linker). I'm not
> sure if there are cases where GCC generates this kind of code or not.
> It's also not practical on many ISAs.

I think I filed a bug asking for this (among other things) in GCC once. 
  Basically, I want __attribute__((visibility("imported"))) or something 
that.

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56527

>
>> (Even better: the loader could patch the PLT with a direct jump.  Could
>> musl do this?  At least in the case where the symbol is within 2G of the
>> PLT entry,
>
> This is really not a good idea. The old PowerPC ABI did this, and musl
> does not support it (it requires the new "secure-plt" mode). Hardened
> kernels have various restrictions on modifying executable pages, up to
> and including completely forbidding this kind of usage. And even if
> it's not forbidden, it's going to use more memory due to an additional
> page (or more) per shared library that's not going to be sharable.
> Also it requires complex per-arch code (minimal machine code
> generation, instruction cache flushing/barriers, etc.).

That extra page might not be needed if the linker could end up removing 
a bunch of GOT entries for functions that don't have their addresses 
taken.  (Or, on x86_64, where unaligned access is cheap, the GOT could 
actually overlap the PLT in memory, but only if DT_BIND_NOW or whatever 
it's called is on.  Hmm.  I bet that the linker could do this in a way 
that doesn't require loader support at all as long as textrel is allowed.)

>
>> this should be straightforward if no threads have been
>> started yet.
>
> Threads having been started or not are not relevant. The newly loaded
> code is not visible until dlopen returns, so nothing can race with
> modifications to it.

True, at least when lazy binding is off.

>
>> If musl did this, it could advertise a nice speedup over
>> glibc...)
>
> I think the performance gain would be mostly theoretical. Do you have
> any timings that show otherwise?

No.  It would reduce pressure on whatever presumably limited resources 
the CPU has for predicting indirect jumps, and it would reduce the 
number of cache lines needed for a call through the PLT.

Doing it cleanly would also probably require a new dynamic entry and a 
new relocation type.

Also, it might be a lost cause when selinux is being used.  I *hate* 
execmem, execmod, etc -- it really should be possible to do this and to 
write a sensible JIT without requiring special selinux permissions.  I 
think that what's needed is a syscall to make a writeable alias of an 
executable mapping.

Anyway, probably not worth it.

--Andy

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.