Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 25 May 2015 02:57:56 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: ppc soft-float regression

On Mon, May 25, 2015 at 08:31:29AM +0200, Jens Gustedt wrote:
> Am Sonntag, den 24.05.2015, 20:36 -0400 schrieb Rich Felker:
> > There's a simple alternative I just came up with though: have
> > dlstart.c compute the number of REL entries that need their addends
> > saved and allocate a VLA on its stack for stages 2 and 3 to use. While
> > the number of addends could be significant, it's many orders of
> > magnitude smaller than the smallest practical stack sizes we could
> > actually run with, so it's perfectly safe to put it on the stack.
> > 
> > Here're the basic changes I'd like to make to dlstart.c to implement
> > this:
> > 
> > 1. Remove processing of DT_JMPREL REL/RELA table. All entries in this
> >    table are necessarily JMP_SLOT type, and thus symbolic, so there's
> >    nothing stage 1 can do with them anyway. Also, being JMP_SLOT type,
> >    they have implicit addends of zero if they're REL-type, so there's
> >    no need to save addends.
> > 
> > 2. Remove the loop in dlstart.c that works like a fake function call 3
> >    times to process DT_JMPREL, DT_REL, and DT_RELA. Instead we just
> >    need 2 iterations, and now the stride is constant in each, so they
> >    should simplify down a lot more inline.
> > 
> > 3. During the loop that processes DT_REL, count the number of
> >    non-relative relocations (ones we skip at this stage), then make a
> >    VLA this size and pass its address to __dls2 as a second argument.
> > 
> > 4. Have the do_relocs in stage 2 save addends in this provided array
> >    before overwriting them, and save its address for use by stage 3.
> > 
> > 5. Have the do_relocs in stage 3 (for ldso/libc only) pull addends
> >    from this array instead of of from inline.
> > 
> > Steps 1 and 2 are purely code removal/simplification and should be
> > done regardless of whether we move forward on the above program, I
> > think. Steps 3-5 add some complexity but hardly any code, just a few
> > lines here and there.
> > 
> > Comments?
> 
> I like it.
> 
> The thing that is a bit critical here, is the VLA. Not because it is a
> VLA as such, but because it is a dynamic allocation on the stack. We
> already have a similar strategy in pthread_create for TLS. The
> difference is that there we have
> 
>  - a sanity check
>  - an alternative strategy if the sanity check fails
> 
> Would there be a possibility to have both here, too?

I'm not sure what you're referring to in pthread_create. It has no
VLA. Maybe you mean the choice of whether to put TLS in a
caller-provided thread stack, but I don't think that's an analogous
situaton. In that one, the provided stack size is known and TLS is
arbitrarily large (under the control of the app and loaded libs). Here
(dlstart), the stack size is not known and the size of the addend
table is not specifically known, but it's fixed at ld-time for libc.so
and the same for every run, and it's orders of magnitude smaller than
any usable stack (e.g. 14 entries on i386).

There are two potential failure modes here:

1. We run out of stack because RLIMIT_STACK is insanely small. In that
   case we'll quickly crash somewhere else. If there's a risk of
   getting off the stack into other memory, that risk would already
   exist in other places.

2. We run out of stack because the REL table is HUGE. This is a static
   condition for the libc.so ELF file and would not change from run to
   run. If something went wrong in the build process to cause this, it
   needs to be fixed.

So I'm skeptical of the need for a fallback.

If we do want/need a fallback, it would need to be of the following
form:

1. Count addends that need to be saved.

2. If cnt<LIMIT (e.g. 16k), produce VLA of the right size; otherwise
   produce VLA of size 1 and pass a null pointer instead of a pointer
   to the VLA.

3. In stage 2, if the addend-buffer pointer is null, allocate storage
   for it with __syscall(SYS_mmap, ...) (we can't call mmap yet).
   Abort on failure.

4. In stage 3, if the addend-buffer was allocated donate it to malloc
   after we're done using it.

I would probably prefer just having it crash/abort in step 3, since
this condition obviously indicates a broken build. It's better to
catch such an issue and fix it than to have a non-robust libc.so that
breaks fail-safety for no-third-party-libs binaries by depending on
allocation.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.