musl - Re: [PATCH 3/3] add s390x port

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20161022025943.GA19318@brightrain.aerifal.cx>
Date: Fri, 21 Oct 2016 22:59:43 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 3/3] add s390x port

On Fri, Oct 21, 2016 at 07:15:19PM -0500, Bobby Bingham wrote:
> > On Mon, Jul 25, 2016 at 10:53:00PM -0500, Bobby Bingham wrote:
> > > diff --git a/arch/s390x/bits/alltypes.h.in b/arch/s390x/bits/alltypes.h.in
> > > new file mode 100644
> > > index 0000000..1a83846
> > > --- /dev/null
> > > +++ b/arch/s390x/bits/alltypes.h.in
> > > @@ -0,0 +1,26 @@
> > > +#define _Addr long
> > > +#define _Int64 long
> > > +#define _Reg long
> > > +
> > > +TYPEDEF __builtin_va_list va_list;
> > > +TYPEDEF __builtin_va_list __isoc_va_list;
> > > +
> > > +#ifndef __cplusplus
> > > +TYPEDEF int wchar_t;
> > > +#endif
> > > +
> > > +TYPEDEF double float_t;
> > 
> > Reportedly this is wrong (a bug copied from glibc) and floats are
> > actually evaluated as float on s390. Can you check and confirm?
> 
> It's evaluating as double here on gcc 4.9.
> 
> printf("%f", FLT_MAX * 2.0f) outputs
> 680564693277057719623408366969033850880.000000 rather than inf.

OK, the story is more complicated it seems -- gcc is emitting code to
do extra work to promote it to double to match a mistake in the glibc
ABI. Details in this thread:

https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg148879.html

Fixing it was discussed but seems to have been dropped, and the
current behavior at least seems to be internally consistent as long as
you're in standards-conforming modes.

I don't care about the arch enough personally for me to push for
fixing it for musl target, but if someone else wants to, please speak
up. Of course we could make it dynamic like i386 (which changes based
on -mfpmath=sse) later if it turns out there is a fix on the gcc side.

> > > diff --git a/arch/s390x/bits/ipc.h b/arch/s390x/bits/ipc.h
> > > new file mode 100644
> > > index 0000000..4710c12
> > > --- /dev/null
> > > +++ b/arch/s390x/bits/ipc.h
> > > @@ -0,0 +1,14 @@
> > > +struct ipc_perm {
> > > +	key_t __ipc_perm_key;
> > > +	uid_t uid;
> > > +	gid_t gid;
> > > +	uid_t cuid;
> > > +	gid_t cgid;
> > > +	mode_t mode;
> > > +	unsigned short __pad1;
> > > +	unsigned short __ipc_perm_seq;
> > > +	unsigned long __pad2;
> > > +	unsigned long __pad3;
> > > +};
> > 
> > __ipc_perm_seq is int on other archs, without the padding. It's not a
> > standard type so I don't think we're forced to have matching type, but
> > I wonder why this is. OTOH maybe some current archs are buggy because
> > of this..
> 
> Strange.  FWIW, it is unsigned short on aarch64 already as well.  And the
> kernel type is int for ppc(64), so that architecture is probably fine.

If the kernel type is short, I would think using int without the
padding field only works when the paddign is placed to match the
endianness. Of course here it does match, but I wonder whether it does
on mips, powerpc, sh, etc. This isn't something you need to follow-up
on; I'm just noting it here so there's a record of it.

> > > diff --git a/arch/s390x/syscall_arch.h b/arch/s390x/syscall_arch.h
> > > [...]
> > > +#define SYSCALL_USE_SOCKETCALL
> > 
> > Is this intentional? s390x does not (or did not always) have dedicated
> > syscalls for socket ops?
> 
> The dedicated syscalls were only added a year ago in kernel 4.3
> (kernel commit 977108f89c989b1eeb5c8d938e1e71913391eb5f)

OK, then this is fine as-is.

> > > diff --git a/src/thread/s390x/__tls_get_offset.s b/src/thread/s390x/__tls_get_offset.s
> > > new file mode 100644
> > > index 0000000..8ee92de
> > > --- /dev/null
> > > +++ b/src/thread/s390x/__tls_get_offset.s
> > > @@ -0,0 +1,17 @@
> > > +	.global __tls_get_offset
> > > +	.type __tls_get_offset,%function
> > > +__tls_get_offset:
> > > +	stmg  %r14, %r15, 112(%r15)
> > > +	aghi  %r15, -160
> > > +
> > > +	la    %r2, 0(%r2, %r12)
> > > +	brasl %r14, __tls_get_addr
> > > +
> > > +	ear   %r1, %a0
> > > +	sllg  %r1, %r1, 32
> > > +	ear   %r1, %a1
> > > +
> > > +	sgr   %r2, %r1
> > > +
> > > +	lmg   %r14, %r15, 272(%r15)
> > > +	br    %r14
> > 
> > Can you explain this function and why a tail-call to __tls_get_addr
> > doesn't work? Based on the (albeit 32-bit) info in
> > https://www.akkadia.org/drepper/tls.pdf it seems like it could but
> > maybe I'm missing something. Is there a special contract about
> > clobbers?
> 
> A tail call doesn't work because the return values aren't the same.  From
> the linked document:
> 
> > The return value of __tls_get_offset is an offset to the thread pointer.
> > To get the address of the requested variable the thread pointer needs to
> > be added to the return value.
> 
> This function is essentially:
> 
> 	size_t __tls_get_offset(size_t offset, char *got)
> 	{
> 		return (char*)__tls_get_addr(got + offset) - (char*)__pthread_self();
> 	}
> 
> Except that the second parameter is passed via r12 instead of the usual r3,
> which is why it's written in assembly.

OK. It might make sense to write a fast-path in asm when the dtv slot
is already present (see src/thread/i386/tls.s for a similar example)
but this could be improved in a later patch. It's not blocking merge
by any means.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.