Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 13 Sep 2015 21:36:08 -0400
From: Rich Felker <dalias@...c.org>
To: Rob Landley <rob@...dley.net>
Cc: musl@...ts.openwall.com, 0pf@...mu.org
Subject: Re: [0pf] musl/SH-FDPIC progress

On Sun, Sep 13, 2015 at 06:45:49PM -0500, Rob Landley wrote:
> On 09/11/2015 02:05 PM, Rich Felker wrote:
> > As noted in the commit message, a patch on the musl side is needed to
> > get actual working binaries. I'll post a version of this (not
> > appropriate for upstream) soon.
> 
> Did you ever write more documentation beyond:
> 
> http://www.aerifal.cx/~dalias/binfmts.html
> 
> I should add a link to that from the web page, but I dunno if there's
> more. I'd like to get a section on nommu.org comparing binflt, static
> pie, and fdpic.

No, that was more brainstorming. I think there's a certain audience
that would find the form I worked out there useful, but I agree
something simpler would be nice for a general audience. I'll see what
I can put together. Some more discussions with people (yourself or
others) interested in the issue but who don't really understand the
motivations for fdpic would be really helpful for me to figure out how
to best get this across.

> Jeff Dionne commented on your toolchain:
> 
> > The only issue is he seems to be exclusively targeting fdpic.  The
> > issue is you loose a few registers.   I don't know what gcc will do
> > performance wise in that case, we need to test.  Hopefully we don't
> > need a 'fall back' to bFLT, or something.
> >
> > A few % hit in an embedded system is a lot...
> 
> To which I don't know how to respond. Register starvation mostly seems
> to crop up on x86 (where they have horrible behind the scenes hardware
> hacks with register renaming and multiple register profiles and so on to
> get decent performance out of a lousy assembly design). But sh2 has 16
> general purpose registers, which is much less constained...

There's only one register "lost", r12, and the loss is much less
severe than in "normal" pic code because on fdpic r12 is
call-clobbered rather than call-saved. This means that leaf functions
which do not access global data can freely clobber r12 -- a situation
even better than the normal non-fdpic ABI, since you have an extra
free register you don't have to save/restore -- and in functions that
do need to make calls, but which also have high register pressure, the
got pointer can be spilled to the stack and only reloaded at call
time.

> I'd say "get 'em both out and benchmark them" but the binflt toolchain
> I've got (cutting an aboriginal linux release as we speak by the way) is
> gcc 4.2.1+binutils 2.17, and yours is something like 8 years later so
> the whole code generation backend is basically redone. Not remotely
> apples to apples there.

We could do some measurements, but in theory the only thing that's
more expensive than normal PIC is indirect calls via function pointers
or PLT. Calls within the same DSO/main-program can still be direct.
Versus non-PIC code, you of course end up doing a little more work to
load globals, as in:

	mov.l 1f, rn   // load got slot offset
	add r12, rn    // add got pointer
	mov.l @rn, rn  // load address from got
	mov.l @rn, rn  // load data

vs:

	mov.l 1f, rn   // load absolute address of data
	mov.l @rn, rn  // load data

Of course if we're comparing against bFLT, the only want bFLT can have
faster code is by having TEXTRELs all over the place, i.e. not being
PIC at all. If you want shared-flat, you need to be using a GOT
register and you have essentially the same costs as fdpic but almost
none of the advantages. I'm not sure which was used in practice:
non-shareable bFLT with TEXTRELs all over the place or PIC bFLT.

> And then there's llvm and you said you got libfirm working? What's
> involved in making a libfirm toolchain, trying to build a system with
> it, and getting j2 code generation out of it?

I think you misinterpreted one of my tweets. I was announcing that
libfirm now has working PIC support to make a working libc.so, but
only on the archs where it has mature codegen to begin with. That's
mainly i386, but other archs including x86_64, arm, and mips are
progressing. If there's interest I could push for or work on sh
support; afaik there's none at all right now. IMO a good starting
point would be a target-generic framework for fdpic-style pic
variants. Being able to offer fdpic as a security measure for MMU-ful
archs like x86 would be quite interesting (there are published papers
on why ASLR is not very useful because of the known constant
displacement between text and data) and something I could pitch to
them as a chance to be the first to offer it. But this is all
longer-term.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.