Date: Tue, 21 Apr 2020 10:39:41 -0400 From: Rich Felker <dalias@...c.org> To: David Laight <David.Laight@...LAB.COM> Cc: 'Nicholas Piggin' <npiggin@...il.com>, Adhemerval Zanella <adhemerval.zanella@...aro.org>, "libc-dev@...ts.llvm.org" <libc-dev@...ts.llvm.org>, "libc-alpha@...rceware.org" <libc-alpha@...rceware.org>, "linuxppc-dev@...ts.ozlabs.org" <linuxppc-dev@...ts.ozlabs.org>, "musl@...ts.openwall.com" <musl@...ts.openwall.com> Subject: Re: Powerpc Linux 'scv' system call ABI proposal take 2 On Tue, Apr 21, 2020 at 12:28:25PM +0000, David Laight wrote: > From: Nicholas Piggin > > Sent: 20 April 2020 02:10 > ... > > >> Yes, but does it really matter to optimize this specific usage case > > >> for size? glibc, for instance, tries to leverage the syscall mechanism > > >> by adding some complex pre-processor asm directives. It optimizes > > >> the syscall code size in most cases. For instance, kill in static case > > >> generates on x86_64: > > >> > > >> 0000000000000000 <__kill>: > > >> 0: b8 3e 00 00 00 mov $0x3e,%eax > > >> 5: 0f 05 syscall > > >> 7: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax > > >> d: 0f 83 00 00 00 00 jae 13 <__kill+0x13> > > Hmmm... that cmp + jae is unnecessary here. It's not.. Rather the objdump was just mistakenly done without -r so it looks like a nop jump rather than a conditional tail call to the function that sets errno. > It is also a 32bit offset jump. > I also suspect it gets predicted very badly. I doubt that. This is a very standard idiom and the size of the offset (which is necessarily 32-bit because it has a relocation on it) is orthogonal to the condition on the jump. FWIW a syscall like kill takes global kernel-side locks to be able to address a target process by pid, and the rate of meaningful calls you can make to it is very low (since it's bounded by time for target process to act on the signal). Trying to optimize it for speed is pointless, and even size isn't important locally (although in aggregate, lots of wasted small size can add up to more pages = more TLB entries = ...). > > >> 13: c3 retq > > >> > > >> While on musl: > > >> > > >> 0000000000000000 <kill>: > > >> 0: 48 83 ec 08 sub $0x8,%rsp > > >> 4: 48 63 ff movslq %edi,%rdi > > >> 7: 48 63 f6 movslq %esi,%rsi > > >> a: b8 3e 00 00 00 mov $0x3e,%eax > > >> f: 0f 05 syscall > > >> 11: 48 89 c7 mov %rax,%rdi > > >> 14: e8 00 00 00 00 callq 19 <kill+0x19> > > >> 19: 5a pop %rdx > > >> 1a: c3 retq > > > > > > Wow that's some extraordinarily bad codegen going on by gcc... The > > > sign-extension is semantically needed and I don't see a good way > > > around it (glibc's asm is kinda a hack taking advantage of kernel not > > > looking at high bits, I think), but the gratuitous stack adjustment > > > and refusal to generate a tail call isn't. I'll see if we can track > > > down what's going on and get it fixed. > > A suitable cast might get rid of the sign extension. > Possibly just (unsigned int). No, it won't. The problem is that there is no representation of the fact that the kernel is only going to inspect the low 32 bits (by declaring the kernel-side function as taking an int argument). The external kill function receives arguments by the ABI, where the upper bits of int args can contain junk, and the asm register constraints for syscalls use longs (or rather an abstract syscall-arg type). It wouldn't even work to have macro magic detect that the expressions passed are ints and use hacks to avoid that, since it's perfectly valid to pass an int to a syscall that expects a long argument (e.g. offset to mmap), in which case it needs to be sign-extended. The only way to avoid this is encoding somewhere the syscall-specific knowledge of what arg size the kernel function expects. That's way too much redundant effort and too error-prone for the incredibly miniscule size benefit you'd get out of it. Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.