Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 27 Mar 2016 19:32:20 -0500
From: Bobby Bingham <koorogi@...rogi.info>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 2/2] add powerpc64 port

On Sun, Mar 27, 2016 at 07:37:09PM -0400, Rich Felker wrote:
> On Sun, Mar 27, 2016 at 04:20:19PM -0500, Bobby Bingham wrote:
> > diff --git a/arch/powerpc64/bits/endian.h b/arch/powerpc64/bits/endian.h
> > new file mode 100644
> > index 0000000..2016cb2
> > --- /dev/null
> > +++ b/arch/powerpc64/bits/endian.h
> > @@ -0,0 +1,5 @@
> > +#if __BIG_ENDIAN__
> > +#define __BYTE_ORDER __BIG_ENDIAN
> > +#else
> > +#define __BYTE_ORDER __LITTLE_ENDIAN
> > +#endif
> 
> This should probably use whatever the official psABI macro for PPC BE
> is rather than __BIG_ENDIAN__ (which I think is a confusingly-named
> GCC thing; it's not clear whether it means "is big endian" or "value
> that indicates big endian").

The ABI spec is at https://members.openpowerfoundation.org/document/dl/576.
Section 5.1.4 documents __BIG_ENDIAN__ and __LITTLE_ENDIAN__ for this
purpose.

> 
> > diff --git a/arch/powerpc64/bits/sem.h b/arch/powerpc64/bits/sem.h
> > new file mode 100644
> > index 0000000..5f04979
> > --- /dev/null
> > +++ b/arch/powerpc64/bits/sem.h
> > @@ -0,0 +1,7 @@
> > +struct semid_ds {
> > +	struct ipc_perm sem_perm;
> > +	time_t sem_otime;
> > +	time_t sem_ctime;
> > +	unsigned long sem_nsems;
> > +	unsigned long __unused[2];
> > +};
> 
> sem_nsems is required to have type unsigned short; this is a POSIX
> requirement that Linux botched. The canonical solution is to put
> endian-dependent padding around it; see other archs for examples.

Ok.

> 
> > diff --git a/arch/powerpc64/bits/signal.h b/arch/powerpc64/bits/signal.h
> > [...]
> > +typedef struct sigcontext
> > +{
> > +	unsigned long _unused[4];

= 4 longs

> > +	int signal;
> > +	int _pad0;
> > +	unsigned long handler;
> > +	unsigned long oldmask;
> > +	void *regs;

= 4 longs

> > +	gregset_t gp_regs;

= 48 longs

> > +	fpregset_t fp_regs;

= 33 longs

> > +	vrregset_t *v_regs;

= 1 long

> > +	long vmx_reserve[34+34+32+1];

= 34+34+32+1 longs

> > +} mcontext_t;
> > +
> > +#else
> > +
> > +typedef struct {
> > +	long __regs[4+4+48+33+1+34+34+32+1];
> > +} mcontext_t;
> > +
> > +#endif
> 
> Did you check that the layout of the namespace-safe and full mcontex_t
> match?

Yes, see the breakdown above.

> 
> > +#define SA_NOCLDSTOP  1U
> > +#define SA_NOCLDWAIT  2U
> > +#define SA_SIGINFO    4U
> > +#define SA_ONSTACK    0x08000000U
> > +#define SA_RESTART    0x10000000U
> > +#define SA_NODEFER    0x40000000U
> > +#define SA_RESETHAND  0x80000000U
> > +#define SA_RESTORER   0x04000000U
> 
> Is there a reason for making these unsigned? It's different from other
> archs at least, I think.

It's the same as the ppc32 port.

> 
> > diff --git a/arch/powerpc64/crt_arch.h b/arch/powerpc64/crt_arch.h
> > new file mode 100644
> > index 0000000..0605511
> > --- /dev/null
> > +++ b/arch/powerpc64/crt_arch.h
> > @@ -0,0 +1,21 @@
> > +__asm__(
> > +".text \n"
> > +".global " START " \n"
> > +".type   " START ", %function \n"
> > +START ": \n"
> > +"	addis 2, 12, .TOC.-" START "@ha \n"
> > +"	addi  2,  2, .TOC.-" START "@l \n"
> 
> What does this do? It looks like canonical function prologue of some
> sort, but _start is not a C function and unless r12 is part of the ELFv2
> entry point ABI that I'm not aware of, I don't think it's meaningful.
> AFAICT r2 is not subsequently used.

In the ABI, C functions have two entry points, the "global" entry point,
and the "local" entry point.  The local entry point expects the "table of
contents" pointer in r2.  The global entry point expects the address of
global entry point itself in r12, uses the prologue I used here to
calculate the TOC pointer from it, and falls through to the local entry
point.  See section 2.3.2.1 of the ABI spec linked above.

Calls within the same module are resolved to the local entry point, so
we need to ensure r2 is set up with the TOC pointer here to call any C
code.

The ABI spec does specify that r12 contains the address of _start,
specifically so this prologue can work (section 4.1.2.1).

> 
> > diff --git a/arch/powerpc64/pthread_arch.h b/arch/powerpc64/pthread_arch.h
> > new file mode 100644
> > index 0000000..2f976fe
> > --- /dev/null
> > +++ b/arch/powerpc64/pthread_arch.h
> > @@ -0,0 +1,17 @@
> > +static inline struct pthread *__pthread_self()
> > +{
> > +	register char *tp __asm__("r13");
> > +	__asm__ __volatile__ ("" : "=r" (tp) );
> > +	return (pthread_t)(tp - 0x7000 - sizeof(struct pthread));
> > +}
> 
> It's possible this asm should have a "memory" clobber to avoid
> reordering across the initial thread pointer setup, but it's
> consistent with ppc32 right now, so we should probably review this as
> a separate issue to consider for both ports rather than something
> blocking the new port.
> 
> > diff --git a/arch/powerpc64/reloc.h b/arch/powerpc64/reloc.h
> > new file mode 100644
> > index 0000000..8e60b31
> > --- /dev/null
> > +++ b/arch/powerpc64/reloc.h
> > @@ -0,0 +1,32 @@
> > +#include <endian.h>
> > +
> > +#if __BYTE_ORDER == __LITTLE_ENDIAN
> > +#define ENDIAN_SUFFIX "le"
> > +#else
> > +#define ENDIAN_SUFFIX ""
> > +#endif
> > +
> > +#define LDSO_ARCH "powerpc64" ENDIAN_SUFFIX
> 
> Is it intentional that the "default" subarch variant be suffixed with
> "le" and a non-default/unused one be the bare "powerpc64"? I don't
> object to that but it's contrary to usual conventions that the bare
> arch be the canonical ABI, and it might be contrary to what GCC is
> doing now (haven't checked) for the dynamic linker name.

I'll double check, but I skimmed the gcc code here, and it looked like
it uses an "le" suffix.  Admittedly, I didn't read it closely enough to
be absolutely sure yet.

> 
> > diff --git a/arch/powerpc64/syscall_arch.h b/arch/powerpc64/syscall_arch.h
> > new file mode 100644
> > index 0000000..aee11eb
> > --- /dev/null
> > +++ b/arch/powerpc64/syscall_arch.h
> > @@ -0,0 +1,5 @@
> > +#define __SYSCALL_LL_E(x) (x)
> > +#define __SYSCALL_LL_O(x) (x)
> > +
> > +#undef SYSCALL_NO_INLINE
> > +#define SYSCALL_NO_INLINE
> 
> Is this just unfinished or is there a reason inline syscalls don't
> work well?

Just unfinished.  ppc32 is the same here.  I can add these.

> 
> > diff --git a/configure b/configure
> > [...]
> > +if test "$ARCH" = "powerpc64" ; then
> > +if ! trycppif "_CALL_ELF == 2" "$t" ; then
> > +fail "$0: error: unsupported powerpc64 ABI"
> > +fi
> 
> I usually use &&/|| for this kind of thing:
> 
> trycppif "_CALL_ELF == 2" "$t" || fail ...

Ok.

> 
> > diff --git a/src/setjmp/powerpc64/longjmp.S b/src/setjmp/powerpc64/longjmp.S
> > new file mode 100644
> > index 0000000..e82e2a7
> > --- /dev/null
> > +++ b/src/setjmp/powerpc64/longjmp.S
> > @@ -0,0 +1,93 @@
> > +	.global _longjmp
> > +	.global longjmp
> > +	.type   _longjmp,@function
> > +	.type   longjmp,@function
> > +_longjmp:
> > +longjmp:
> > +	# 0) move old return address into the link register
> > +	ld   0,  0*8(3)
> > +	mtlr 0
> > +	# 1) restore cr
> > +	ld   0,  1*8(3)
> > +	mtcr 0
> > +	# 2) restore r1-r2 (SP and TOC)
> > +	ld   1,  2*8(3)
> > +	ld   2,  3*8(3)
> > +	# 3) restore r14-r31
> > +	ld  14,  4*8(3)
> > +	ld  15,  5*8(3)
> > +	ld  16,  6*8(3)
> > +	ld  17,  7*8(3)
> > +	ld  18,  8*8(3)
> > +	ld  19,  9*8(3)
> > +	ld  20, 10*8(3)
> > +	ld  21, 11*8(3)
> > +	ld  22, 12*8(3)
> > +	ld  23, 13*8(3)
> > +	ld  24, 14*8(3)
> > +	ld  25, 15*8(3)
> > +	ld  26, 16*8(3)
> > +	ld  27, 17*8(3)
> > +	ld  28, 18*8(3)
> > +	ld  29, 19*8(3)
> > +	ld  30, 20*8(3)
> > +	ld  31, 21*8(3)
> > +	# 4) restore floating point registers f14-f31
> > +	lfd 14, 22*8(3)
> > +	lfd 15, 23*8(3)
> > +	lfd 16, 24*8(3)
> > +	lfd 17, 25*8(3)
> > +	lfd 18, 26*8(3)
> > +	lfd 19, 27*8(3)
> > +	lfd 20, 28*8(3)
> > +	lfd 21, 29*8(3)
> > +	lfd 22, 30*8(3)
> > +	lfd 23, 31*8(3)
> > +	lfd 24, 32*8(3)
> > +	lfd 25, 33*8(3)
> > +	lfd 26, 34*8(3)
> > +	lfd 27, 35*8(3)
> > +	lfd 28, 36*8(3)
> > +	lfd 29, 37*8(3)
> > +	lfd 30, 38*8(3)
> > +	lfd 31, 39*8(3)
> > +
> > +	# 5) restore vector registers v20-v31
> > +	addi  3, 3, 40*8
> > +	lvx   2, 0, 3
> > +
> > +#if __BIG_ENDIAN__
> > +	lvsl  0, 0, 3
> > +#define load_vr(cur,tmp1,tmp2) \
> > +	addi  3, 3, 16;   \
> > +	lvx   tmp2, 0, 3; \
> > +	vperm cur, tmp1, tmp2, 0
> > +#else
> > +	lvsr  0, 0, 3
> > +#define load_vr(cur,tmp1,tmp2) \
> > +	addi  3, 3, 16;   \
> > +	lvx   tmp2, 0, 3; \
> > +	vperm cur, tmp2, tmp1, 0
> > +#endif
> > +
> > +	load_vr(20, 2, 3)
> > [...]
> 
> This is kind of the reason why I was hesitant to add .S support for so
> long. :-)
> 
> I don't want to reject it outright, but the idea of adding .S support
> was just to allow conditional compilation, not to do condensed
> assembly sources that require macro expansion. I can see where the
> code might be unwieldy without this though. Anyone else have opinions?
> 
> > diff --git a/src/signal/powerpc64/sigsetjmp.s b/src/signal/powerpc64/sigsetjmp.s
> > new file mode 100644
> > index 0000000..ce59b60
> > --- /dev/null
> > +++ b/src/signal/powerpc64/sigsetjmp.s
> > @@ -0,0 +1,30 @@
> > +	.global sigsetjmp
> > +	.global __sigsetjmp
> > +	.type sigsetjmp,%function
> > +	.type __sigsetjmp,%function
> > +	.hidden ___setjmp
> > +sigsetjmp:
> > +__sigsetjmp:
> > +	addis 2, 12, .TOC.-__sigsetjmp@ha
> > +	addi  2,  2, .TOC.-__sigsetjmp@l
> > +	.localentry sigsetjmp,.-sigsetjmp
> > +	.localentry __sigsetjmp,.-__sigsetjmp
> 
> Again I don't see what the purpose of these insns is; if the resulting
> value is needed, are you aware of how that interacts with ___setjmp
> returning twice?

This sets up r2 with the TOC pointer, as is required by the ABI in order
to call setjmp's local entry point.  Since setjmp is also written in asm,
we could do away with this here.

I don't think the fact that setjmp returns twice matters for this.

> 
> It looks to me like the assumption is that r12 contains the address of
> the callee at the time of a function call, but I don't see how that's
> satisfied in the calls I've seen.

The instructions before the .localentry directive are the global entry
point.  The ABI requires that r12 be the address of the callee whenever
the global entry point of a function is called.

> 
> Other than that, things looked good! Obviously I haven't tested it
> yet, and I also haven't read some of the actual code in detail yet, so
> I don't know if there are other bugs; my review so far mostly covers
> design/style/conformance/policy matters.
> 
> Rich

Bobby

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.