Date: Fri, 5 Sep 2014 04:32:18 -0400 From: Rich Felker <dalias@...c.org> To: musl@...ts.openwall.com Subject: Re: In-progress stuff, week of Sept 1 On Fri, Sep 05, 2014 at 09:41:42AM +0200, Jens Gustedt wrote: > Hello, > > Am Donnerstag, den 04.09.2014, 21:14 -0400 schrieb Rich Felker: > > First, things that definitely are going in this release cycle: > > > > - Jens Gustedt's C11 threads: From discussion it's ready, but the last > > iteration of the patches added more gratuitous changes at the same > > time it removed others. Jens is going to be less available for a > > while, so I'm going to finish getting the patches ready to commit, > > but it's going to take a little work going back and reviewing the > > email threads to determine what was the outcome of discussion versus > > what's new, and among what's new, whether it's essential or side > > changes orthogonal to adding C11 threads. > > Sorry if that left that impression to you. There is only one thing I > remember, that is not really sorted out in the last patch series I > sent, the timespec_get things. The other patches should be relatively > clean, no? In particular, the higher numbered patches (I think it was > 8 and 9) can just be omitted, if you don't like them. > > In any case, just send me requests for diff's or entire files in that > or that version, my git history contains them all. I think it's mainly the __clock_gettime changes that crept well outside the scope of doing C11 threads, but I need to look again. Trying to get some other things committed at the moment. > I have other points that I delayed for later. Off my head: > > - review atomics > > * This starts by adding some "i" constraints to the assembler. Gcc is > perfectly capable to optimize static inlined asm that has "ir" > constraints. In the code a lot of these are used with constants 0 > or 1, this would shave of the use of a register at all these > points. Examples? We already use "ir" for syscalls on some archs, and I had good results. But I wasn't aware that there were places in the atomic code where "i" could help. > * Review the necessity of having all of these with "memory". As long > as the pointer argument is constrained with "m" and not "p", this > should suffice to encode the data dependency. The only thing that > we'd have to be careful with is reordering, but I think this could > be achieved by using `volatile`, instead. In order for their memory-barrier semantics to be useful, all atomics also need to be full compiler barriers. Otherwise the compiler could move loads/stores of data it thinks is unrelated across the asm block, and thereby across the memory barrier. > * Move to gcc's __sync and __atomic builtins. The first are available > since always, and both are easily testable with macros. In case of > __atomic this would make it possible to move the atomics used in > musl to acquire-release semantics, which may pay for some archs, in > particular arm, I think. > > This also should make it easier to port to new architectures in the > future, as soon as there is a working gcc for it. This has already been discussed in the past, and it's not a good option: - It would severely reduce the set of compilers that can be used to build musl. - It makes it impossible to correctly support archs where different atomic strategies might need to be chosen at runtime. This includes both arm (see the long threads on what to do about broken grsec kernels without the kuser helper page) and sh, and maybe more. Even if the compiler could be taught to solve this problem, we'd then be bumping the compiler requirement to gcc 5.0 or something, which would be completely unacceptable. - POSIX requires seq_cst semantics for everything anyway, so the benefit would only be for C11 sync objects, and only if we write new implementations of them using relaxed atomics. I'd like to pursue an approach with relaxed atomics at some point, probably with asm still rather than intrinsics, but IMO it's rather low priority. I'd like to see the whole C11/POSIX alignment matter get settled first so we're not shooting for a moving target. > - implement a simple lock feature based on futex that encodes the > wait counter inside the `int`. > > After the discussion with Rich some weeks ago I gave it a thought, > and I think this can be simply done by using the parity bit for the > lock and the other bits for the waiters count. On i386 the parity > bit (as any other bit) can effectively accessed atomically with > "lock ; btsl ". > > IIRC, having such a simple lock would also help as a third step of > the changes to cv that I proposed and that you mention above. I'm not sure what you mean, but it's a bad idea to have the waiters count inside the futex int. This is because every time the futex int changes, you risk FUTEX_WAIT returning with EAGAIN. Under high load of waiters arriving/leaving, it might be impossible to _ever_ start a wait; all threads may just spin. The futex int should be something that doesn't change except for a very small finite number of changes (like adding a waiter flag, not a waiter count) before you actually have a chance of successfully getting the lock or whatever. > - __timedwait translates absolute time to relative time, by taking > the current time with clock_gettime. This could be avoided for > CLOCK_REALTIME and CLOCK_MONOTONIC when using FUTEX_WAIT_BITSET > instead of FUTEX_WAIT. That one uses absolute time from the > start. Since C threads don't leave a chose for the clock they are > using this would avoid using this code path completely and shave of > one system call. IIRC I tried to do that at one point but then decided against it. That was several years back, so it might be worth revisiting now, but keep in mind clock_gettime is expected to be a vdso call (no syscall) on first-class archs so it's not as big an issue as you might think.x Rich
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.