Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Mon, 25 May 2015 09:44:44 +0200
From: Jens Gustedt <jens.gustedt@...ia.fr>
To: musl@...ts.openwall.com
Subject: Re: ppc soft-float regression

Am Montag, den 25.05.2015, 02:57 -0400 schrieb Rich Felker:
> On Mon, May 25, 2015 at 08:31:29AM +0200, Jens Gustedt wrote:
> > Am Sonntag, den 24.05.2015, 20:36 -0400 schrieb Rich Felker:
> > > There's a simple alternative I just came up with though: have
> > > dlstart.c compute the number of REL entries that need their addends
> > > saved and allocate a VLA on its stack for stages 2 and 3 to use. While
> > > the number of addends could be significant, it's many orders of
> > > magnitude smaller than the smallest practical stack sizes we could
> > > actually run with, so it's perfectly safe to put it on the stack.
> > > 
> > > Here're the basic changes I'd like to make to dlstart.c to implement
> > > this:
> > > 
> > > 1. Remove processing of DT_JMPREL REL/RELA table. All entries in this
> > >    table are necessarily JMP_SLOT type, and thus symbolic, so there's
> > >    nothing stage 1 can do with them anyway. Also, being JMP_SLOT type,
> > >    they have implicit addends of zero if they're REL-type, so there's
> > >    no need to save addends.
> > > 
> > > 2. Remove the loop in dlstart.c that works like a fake function call 3
> > >    times to process DT_JMPREL, DT_REL, and DT_RELA. Instead we just
> > >    need 2 iterations, and now the stride is constant in each, so they
> > >    should simplify down a lot more inline.
> > > 
> > > 3. During the loop that processes DT_REL, count the number of
> > >    non-relative relocations (ones we skip at this stage), then make a
> > >    VLA this size and pass its address to __dls2 as a second argument.
> > > 
> > > 4. Have the do_relocs in stage 2 save addends in this provided array
> > >    before overwriting them, and save its address for use by stage 3.
> > > 
> > > 5. Have the do_relocs in stage 3 (for ldso/libc only) pull addends
> > >    from this array instead of of from inline.
> > > 
> > > Steps 1 and 2 are purely code removal/simplification and should be
> > > done regardless of whether we move forward on the above program, I
> > > think. Steps 3-5 add some complexity but hardly any code, just a few
> > > lines here and there.
> > > 
> > > Comments?
> > 
> > I like it.
> > 
> > The thing that is a bit critical here, is the VLA. Not because it is a
> > VLA as such, but because it is a dynamic allocation on the stack. We
> > already have a similar strategy in pthread_create for TLS. The
> > difference is that there we have
> > 
> >  - a sanity check
> >  - an alternative strategy if the sanity check fails
> > 
> > Would there be a possibility to have both here, too?
> 
> I'm not sure what you're referring to in pthread_create. It has no
> VLA. Maybe you mean the choice of whether to put TLS in a
> caller-provided thread stack,

that's what I meant

> but I don't think that's an analogous
> situaton. In that one, the provided stack size is known and TLS is
> arbitrarily large (under the control of the app and loaded libs). Here
> (dlstart), the stack size is not known and the size of the addend
> table is not specifically known, but it's fixed at ld-time for libc.so
> and the same for every run, and it's orders of magnitude smaller than
> any usable stack (e.g. 14 entries on i386).

> There are two potential failure modes here:
> 
> 1. We run out of stack because RLIMIT_STACK is insanely small. In that
>    case we'll quickly crash somewhere else. If there's a risk of
>    getting off the stack into other memory, that risk would already
>    exist in other places.
> 
> 2. We run out of stack because the REL table is HUGE. This is a static
>    condition for the libc.so ELF file and would not change from run to
>    run. If something went wrong in the build process to cause this, it
>    needs to be fixed.
> 
> So I'm skeptical of the need for a fallback.
> 
> If we do want/need a fallback, it would need to be of the following
> form:
> 
> 1. Count addends that need to be saved.
> 
> 2. If cnt<LIMIT (e.g. 16k), produce VLA of the right size; otherwise
>    produce VLA of size 1 and pass a null pointer instead of a pointer
>    to the VLA.

Ok, so we have a simple sanity check, great.

> 3. In stage 2, if the addend-buffer pointer is null, allocate storage
>    for it with __syscall(SYS_mmap, ...) (we can't call mmap yet).
>    Abort on failure.
> 
> 4. In stage 3, if the addend-buffer was allocated donate it to malloc
>    after we're done using it.
> 
> I would probably prefer just having it crash/abort in step 3,

me too

> since
> this condition obviously indicates a broken build. It's better to
> catch such an issue and fix it than to have a non-robust libc.so that
> breaks fail-safety for no-third-party-libs binaries by depending on
> allocation.

Observe that I said "alternative strategy" not "fallback".

"Crash early" to be that strategy sounds completely ok to me.

Thanks

Jens

-- 
:: INRIA Nancy Grand Est ::: Camus ::::::: ICube/ICPS :::
:: ::::::::::::::: office Strasbourg : +33 368854536   ::
:: :::::::::::::::::::::: gsm France : +33 651400183   ::
:: ::::::::::::::: gsm international : +49 15737185122 ::
:: http://icube-icps.unistra.fr/index.php/Jens_Gustedt ::




Download attachment "signature.asc" of type "application/pgp-signature" (182 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.