Date: Tue, 30 Sep 2014 20:05:16 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: Re: A running list of questions from "porting" Slackware to musl On Tue, Sep 30, 2014 at 04:50:28PM -0700, Andy Lutomirski wrote: > > When gcc generates the canary-check code, on failure it normally > > calls/jumps to __stack_chk_fail. But for shared libraries, that call > > would go to a thunk in the library's PLT, which depends on the GOT > > register being initialized (actually this varies by arch; x86_64 > > doesn't need it). In order to avoid (expensive) loading of the GOT > > register in every function just as a contingency in case > > __stack_chk_fail needs to be called, for position-independent code GCC > > generates a call to __stack_chk_fail_local instead. This is a hidden > > function (and necessarily exists within the same .so) so the call > > doesn't have to go through the PLT; it's just a straight relative > > call/jump instruction. __stack_chk_fail_local is then responsible for > > loading the GOT register and calling __stack_chk_fail. > > [slightly off topic] > > Does GCC even know how to call through the GOT instead of the PLT? > Windows (at least 32-bit Windows) has done for decades, at least if > dllimport is set. > > On x86_64, this would be call *whatever@...off(%rip) instead of call > whatever@.... This precludes optimizing out the indirection at link time (or at least it requires more complex transformation in the linker). I'm not sure if there are cases where GCC generates this kind of code or not. It's also not practical on many ISAs. > (Even better: the loader could patch the PLT with a direct jump. Could > musl do this? At least in the case where the symbol is within 2G of the > PLT entry, This is really not a good idea. The old PowerPC ABI did this, and musl does not support it (it requires the new "secure-plt" mode). Hardened kernels have various restrictions on modifying executable pages, up to and including completely forbidding this kind of usage. And even if it's not forbidden, it's going to use more memory due to an additional page (or more) per shared library that's not going to be sharable. Also it requires complex per-arch code (minimal machine code generation, instruction cache flushing/barriers, etc.). > this should be straightforward if no threads have been > started yet. Threads having been started or not are not relevant. The newly loaded code is not visible until dlopen returns, so nothing can race with modifications to it. > If musl did this, it could advertise a nice speedup over > glibc...) I think the performance gain would be mostly theoretical. Do you have any timings that show otherwise? Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.