musl - Re: C threads, v. 6.2

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140828161552.GY12888@brightrain.aerifal.cx>
Date: Thu, 28 Aug 2014 12:15:52 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: C threads, v. 6.2

On Thu, Aug 28, 2014 at 11:40:50AM +0200, Jens Gustedt wrote:
> Hi,
> thanks for the fast review and reply, Rich.
> 
> I will not reply to all raised issues, otherwise this gets out of
> bounds :) Consider the ones where I don't as agreed.
> 
> For a start, I didn't intend this to be integrated into musl as just
> one patch, this was just meant for easier review my list of perhaps 30
> patches that I have divided this currently. Once we agree on the
> content, I would regroup and rebase these to reasonable semantical
> units, and propose them as 4 or 5 patches, say.

OK, that makes sense.

> Am Mittwoch, den 27.08.2014, 19:46 -0400 schrieb Rich Felker:
> > On Thu, Aug 28, 2014 at 12:11:45AM +0200, Jens Gustedt wrote:
> > > diff --git a/arch/arm/bits/alltypes.h.in b/arch/arm/bits/alltypes.h.in
> > > index 183c4c4..333d43f 100644
> > > --- a/arch/arm/bits/alltypes.h.in
> > > +++ b/arch/arm/bits/alltypes.h.in
> > > @@ -19,7 +19,7 @@ TYPEDEF long time_t;
> > >  TYPEDEF long suseconds_t;
> > >  
> > >  TYPEDEF struct { union { int __i[9]; unsigned __s[9]; } __u; } pthread_attr_t;
> > > -TYPEDEF struct { union { int __i[6]; volatile void *volatile __p[6]; } __u; } pthread_mutex_t;
> > > -TYPEDEF struct { union { int __i[12]; void *__p[12]; } __u; } pthread_cond_t;
> > > +TYPEDEF struct { union { int __i[6]; volatile void *volatile __p[6]; } __u; } __pthread_mutex_t;
> > > +TYPEDEF struct { union { int __i[12]; void *__p[12]; } __u; } __pthread_cond_t;
> > > [...]
> > > +TYPEDEF __pthread_cond_t pthread_cond_t;
> > > +TYPEDEF __pthread_cond_t cnd_t;
> > > +TYPEDEF __pthread_mutex_t pthread_mutex_t;
> > > +TYPEDEF __pthread_mutex_t mtx_t;
> > 
> > This changes the C++ ABI by changing the tag for the POSIX types, so
> > it can't be done this way. We could return to some serious language
> > lawyering about whether it's valid to access members via the 'wrong'
> > struct type when we have two identical struct types, or just assume
> > that's safe for now and do it.
> 
> I don't see your point, this doesn't change the tag name, no? (There
> is no tag name, here.) This is just typedeffing to the same type as
> before so this should be "as allowed" as it was before.

That's not how it works. In C++, there is no separate namespace for
struct tags versus types. If you define a struct without a tag but
using typedef, the typedef name becomes its effective struct tag,
which is what gets used for name mangling. If you later use typedef
again to make an alias for this type, it does not make a new struct
type with a new tag, just an alias for the existing one. Thus, your
patch as written changes the struct tags for C++ ABI from
"pthread_mutex_t" and "pthread_cond_t" to "__pthread_mutex_t" and
"__pthread_cond_t".

Yes this is ugly. Blame the C++ folks. (Not sure whether it's the
standard's fault or the ABI's fault.)

> > > +#ifdef __GNUC__
> > > +#define _Nonnull(...) __attribute__((__nonnull__(__VA_ARGS__)))
> > > +#else
> > > +#define _Nonnull(...)
> > > +#endif
> > 
> > If we want to use attributes like this, it's a separate change from
> > adding threads. But I'm fairly much against doing it anyway since the
> > compiler already knows how to make these warnings for standard
> > functions, without help from the headers.
> 
> There will be a while, until compilers know these things for C11
> thread functions, I think.
> 
> BTw, I was tempted to do it the "C" way by using things like
> `[static 1]` instead of `*`, but then I thought you wouldn't like
> that.

Actually I like it much better, but I'm not sufficiently familiar with
it to know whether it's a conforming declaration. Is it formally
compatible with the standard prototype? If so, how does it allow the
compiler to enforce anything?

> > > diff --git a/src/mman/mprotect.c b/src/mman/mprotect.c
> > > index f488486..535787b 100644
> > > --- a/src/mman/mprotect.c
> > > +++ b/src/mman/mprotect.c
> > > @@ -2,10 +2,12 @@
> > >  #include "libc.h"
> > >  #include "syscall.h"
> > >  
> > > -int mprotect(void *addr, size_t len, int prot)
> > > +int __mprotect(void *addr, size_t len, int prot)
> > >  {
> > >  	size_t start, end;
> > >  	start = (size_t)addr & -PAGE_SIZE;
> > >  	end = (size_t)((char *)addr + len + PAGE_SIZE-1) & -PAGE_SIZE;
> > >  	return syscall(SYS_mprotect, start, end-start, prot);
> > >  }
> > > +
> > > +weak_alias(__mprotect, mprotect);
> > 
> > Here it might be nicer to just use __syscall(SYS_mprotect...) directly
> > in pthread_create. We don't need the extra page alignment overhead
> > there, only for the public function, and __syscall is lighter than a
> > function call anyway.
> 
> I'd rather leave such intrusive changes to the pthread code to you :)

OK, I can go ahead and make it as a "shrink code and reduce namespace
dependence in preparation for C11 threads" change.

> > > diff --git a/src/thread/mtx_timedlock.c b/src/thread/mtx_timedlock.c
> > > new file mode 100644
> > > index 0000000..c42a096
> > > --- /dev/null
> > > +++ b/src/thread/mtx_timedlock.c
> > > @@ -0,0 +1,19 @@
> > > +#include "pthread_impl.h"
> > > +#include <threads.h>
> > > +
> > > +int __pthread_mutex_timedlock(pthread_mutex_t *restrict m, const struct timespec *restrict ts);
> > > +
> > > +int mtx_timedlock(mtx_t *restrict mutex, const struct timespec *restrict ts) {
> > 
> > It might be worth duplicating the a_cas attempt here for normal-type;
> > I'm not sure.
> 
> that would be premature optimization :)

Perhaps. The same optimization helps in some other places, but it's of
course non-essential, so let's revisit it later sometime after commit.

> > > diff --git a/src/thread/mtx_unlock.c b/src/thread/mtx_unlock.c
> > > new file mode 100644
> > > index 0000000..d1721a6
> > > --- /dev/null
> > > +++ b/src/thread/mtx_unlock.c
> > > @@ -0,0 +1,11 @@
> > > +#include "pthread_impl.h"
> > > +#include <threads.h>
> > > +
> > > +int __pthread_mutex_unlock(pthread_mutex_t *);
> > > +
> > > +int (mtx_unlock)(mtx_t *mtx) {
> > > +	/* In the best of all worlds this is a tail call. */
> > > +	int ret = __pthread_mutex_unlock(mtx);
> > > +	/* In case of UB may also return EPERM. */
> > > +	return ret ? thrd_error : thrd_success;
> > > +}
> > 
> > For unlock, C11 makes it UB to unlock a mutex you don't own (even for
> > recursive type), so the tail call would be legal. I don't have a
> > strong opinion either way here.
> 
> I'll remove the comment, this is a left over. Tail call is still not
> possible, since moraly thrd_success is not 0 :)
> 
> But we could just return thrd_success unconditionnally.

Or: thrd_success ? thrd_success : ret;

But really, I'm fine with internal assumptions that success is 0.
That's part of why we would want to define it that way.

> > > diff --git a/src/thread/pthread_setspecific.c b/src/thread/pthread_setspecific.c
> > > index 55e46a8..dd18b96 100644
> > > --- a/src/thread/pthread_setspecific.c
> > > +++ b/src/thread/pthread_setspecific.c
> > > @@ -1,6 +1,6 @@
> > >  #include "pthread_impl.h"
> > >  
> > > -int pthread_setspecific(pthread_key_t k, const void *x)
> > > +int __pthread_setspecific(pthread_key_t k, const void *x)
> > >  {
> > >  	struct pthread *self = __pthread_self();
> > >  	/* Avoid unnecessary COW */
> > > @@ -10,3 +10,5 @@ int pthread_setspecific(pthread_key_t k, const void *x)
> > >  	}
> > >  	return 0;
> > >  }
> > > +
> > > +weak_alias(__pthread_setspecific, pthread_setspecific);
> > 
> > It looks like you have a duplicate tss_set for this rather than using
> > the above, so either the above is a grauitous change, or the new
> > tss_set is duplicate code that should be removed. Or did I miss
> > something?
> 
> you are right, this should be omitted
> 
> (the tss_get code differs in the const qualification of the second argument)

In this case it's still possible to implement tss_set as a wrapper,
just not an alias, but the code is sufficiently small that maybe it's
better just to duplicate it.

> > Didn't we discuss before just passing a special attr pointer to
> > __pthread_create to effect C11-style creation?
> 
> I did that, and that version could still be revived. But I was not too
> happy with it, because it meant intruding into the existing pthread
> code. For this version, the pthread code is just reorganized in
> different TU, but otherwise there should be no change.

>From a maintainability standpoint, having two subtly different
versions of the same complex code is a big cost. I wouldn't be
fundamentally opposed to a factoring that avoids this duplication
while allowing the POSIX-specific parts to be optimized out for pure
C11 use, but I think that would be a lot more invasive (mainly in
terms of potential for errors, possibly also for performance, but
unlikely since basically all the time is spent in mmap, mprotect, and
clone anyway) and would be something to look into at a later time, not
while trying to get the initial work on C11 threads in.

> Also I'd really like to advertise C threads as a lightweighted,
> stripped down thread implementation, so the short cuts (that are
> merely a fall off of the code separation, here) were politically
> welcome.

I don't think saving 200 bytes in a pure-C11 static-linked program
while adding 300 bytes to libc.so in code duplication (these numbers
are just guesses but sound about right) really makes it come across as
more lightweight.

> > > diff --git a/src/thread/thrd_current.c b/src/thread/thrd_current.c
> > > new file mode 100644
> > > index 0000000..1728535
> > > --- /dev/null
> > > +++ b/src/thread/thrd_current.c
> > > @@ -0,0 +1,11 @@
> > > +#include "pthread_impl.h"
> > > +#include <threads.h>
> > > +
> > > +/* Not all arch have __pthread_self as a symbol. On some this results
> > > +   in some magic. So this "call" to __pthread_self is not necessary a
> > > +   tail call. In particular, other than the appearance, thrd_current
> > > +   can not be implemented as a weak symbol. */
> > > +pthread_t thrd_current()
> > > +{
> > > +	return __pthread_self();
> > > +}
> > 
> > OK, but I'm not sure the detailed comment is needed. The exact same
> > code appears in pthread_self.c.
> 
> I certainly have the tendency to comment a bit more than you :)

Yes, I have a bias towards omitting comments unless there's some
serious nontrivial logic (and even then sometimes I omit them). When a
comment would be more a matter of "justification of changes" or "why
it's being done this way rather than some other way", I usually use
the git log rather than source-level comments since I feel like the
text is more useful to someone studying the history of the source than
someone trying to understand what it's doing now.

> > > diff --git a/src/thread/thrd_detach.c b/src/thread/thrd_detach.c
> > > new file mode 100644
> > > index 0000000..211b8fb
> > > --- /dev/null
> > > +++ b/src/thread/thrd_detach.c
> > > @@ -0,0 +1,18 @@
> > > +#include "pthread_impl.h"
> > > +#include <threads.h>
> > > +
> > > +int __munmap(void *, size_t);
> > > +
> > > +int thrd_detach(thrd_t t)
> > > +{
> > > +	/* Cannot detach a thread that's already exiting */
> > > +	if (a_swap(t->exitlock, 1)){
> > > +		int tmp;
> > > +		while ((tmp = t->tid)) __timedwait(&t->tid, tmp, 0, 0, 0, 0, 0);
> > > +		if (t->map_base) __munmap(t->map_base, t->map_size);
> > > +	} else {
> > > +		t->detached = 2;
> > > +		__unlock(t->exitlock);
> > > +	}
> > > +	return thrd_success;
> > > +}
> > 
> > You already moved pthread_detach to the protected namespace, so this
> > seems to be duplicate code.
> 
> It is.
> 
> > And it's a place that has implementation
> > details (magic numbers like 2, which you could argue shouldn't exist
> > ;-)
> 
> I would !)
> 
> > so I think it's best to avoid spreading these details out over
> > more places where they could go out of sync, no?
> 
> yes, probably, the fact that pthread_detach called pthread_join
> somehow disturbed me. I'll reconsider

It's not so odd if you think about it -- when the thread to be
detached has already exited, you can't (implementation-wise) detach it
and leave it responsible for cleaning up itself, because it doesn't
exist anymore. Since you know it's exited though, joining and throwing
away the result is possible.

> > > diff --git a/src/thread/tss_delete.c b/src/thread/tss_delete.c
> > > new file mode 100644
> > > index 0000000..d50731f
> > > --- /dev/null
> > > +++ b/src/thread/tss_delete.c
> > > @@ -0,0 +1,8 @@
> > > +#include "pthread_impl.h"
> > > +#include <threads.h>
> > > +
> > > +int __pthread_key_delete(pthread_key_t k);
> > > +
> > > +void (tss_delete)(tss_t key) {
> > > +	(void)__pthread_key_delete(key);
> > > +}
> > 
> > This could probably be done with just an alias, but I'm fine with the
> > tail-call.
> 
> I thought of this, but I was reluctant to make an alias to a function
> with a different calling convention. Couldn't that pollute the return
> register in an unpredictable way?

I missed that -- sorry. Indeed, the way you've done it is fine.

> > > diff --git a/src/thread/tss_set.c b/src/thread/tss_set.c
> > > new file mode 100644
> > > index 0000000..70c4fb7
> > > --- /dev/null
> > > +++ b/src/thread/tss_set.c
> > > @@ -0,0 +1,13 @@
> > > +#include "pthread_impl.h"
> > > +#include <threads.h>
> > > +
> > > +int tss_set(tss_t k, void *x)
> > > +{
> > > +	struct pthread *self = __pthread_self();
> > > +	/* Avoid unnecessary COW */
> > > +	if (self->tsd[k] != x) {
> > > +		self->tsd[k] = x;
> > > +		self->tsd_used = 1;
> > > +	}
> > > +	return thrd_success;
> > > +}
> > 
> > As mentioned before, this is duplicate, and it could probably just be
> > an alias.
> 
> no, I don't think so, it has a different interface, there is no const

Well a wrapper would work, but the function is small/trivial enough
that a wrapper is not significantly smaller or simpler than just
duplicating it like you've done.

> > > +/* Roughly __syscall already returns the right thing: 0 if all went
> > > +   well or a negative error indication, otherwise. */
> > > +int thrd_sleep(const struct timespec *req, struct timespec *rem)
> > > +{
> > > +  int ret = __syscall(SYS_nanosleep, req, rem);
> > > +  // compile time comparison, constant propagated
> > > +  if (EINTR != 1) {
> > > +    switch (ret) {
> > > +    case 0: return 0;
> > > +      /* error described by POSIX:                                    */
> > > +      /* EINTR  The nanosleep() function was interrupted by a signal. */
> > > +      /* The C11 wording is:                                          */
> > > +      /* -1 if it has been interrupted by a signal                    */
> > > +    case -EINTR:  return -1;
> > > +      /* error described by POSIX */
> > > +    case -EINVAL: return -2;
> > > +      /* described for linux */
> > > +    case -EFAULT: return -3;
> > > +      /* No other error returns are specified. */
> > > +    default: return INT_MIN;
> > > +    }
> > > +  }
> > > +  // potentially a tail call
> > > +  return ret;
> > > +}
> > 
> > __syscall is almost never a "tail call" because it's rarely a call at
> > all; usually, it's inline asm. :-)
> 
> I'll remove that comment and the check for EINTR being 1.

OK.

> > But okay. My preference would be
> > not to do fancy remapping of error codes (especially not when the
> > codes will differ by arch, since this just encourages programs to
> > assume they're fixed and then break on the arch where they're
> > different, if any).
> 
> We have to be at least partially fancy, because C enforces -1 for the
> interrupt case without really being precise what that should mean. I
> just naively assumed that they mean the same as EINTR :)

I agree.

> For EINVAL, as discussed above, I'd expect that the C specification
> will be modified to also have the validity check for the req
> argument. POSIX has a "shall" here, so I'd like to keep the case well
> identified.

That seems likely, or they might make it undefined. C <3 UB...

> For the EFAULT case, I don't have a strong opinion, could be either
> way.

EFAULT really isn't an error the caller can expect to get. If there's
any userspace work aside from the syscall, it would be SIGSEGV
instead.

Is there a reason non-EINTR errors need to be mapped to distinct
values, though? I was thinking just 3 cases, 0, EINTR, and default,
would suffice.

> > > diff --git a/src/time/timespec_get.c b/src/time/timespec_get.c
> > > new file mode 100644
> > > index 0000000..bf78e5a
> > > --- /dev/null
> > > +++ b/src/time/timespec_get.c
> > > @@ -0,0 +1,9 @@
> > > +#include <time.h>
> > > +
> > > +int __clock_gettime(clockid_t clk, struct timespec *ts);
> > > +
> > > +/* the base argument is simply ignored, there is no other implemented
> > > +   value than TIME_UTC. */
> > > +int timespec_get(struct timespec * ts, int base) {
> > > +  return __clock_gettime(CLOCK_REALTIME, ts) < 0 ? 0 : base;
> > > +}
> > 
> > As mentioned in another email (off-list, IIRC), I'd like to reject
> > unknown base values so that programs built against a newer musl that
> > has other clock bases, but which get an older musl at runtime, will
> > see an error rather than silently using the wrong clock. Does this
> > make sense?
> 
> I makes perfect sense, but I don't recall you having that said as
> clearly.

Sorry.

> BTW, the choice of CLOCK_REALTIME, here, could be subject of
> debade. IIRC this clock doesn't exactly give the seconds since Unix
> Epoch, but that value minus some leapseconds. So it doesn't seem to
> fulfill the C specification.

With any luck, UTC might eventually be fixed to drop leap seconds, and
then it won't matter for future times (and having your clock set to a
past time is nonsense anyway). But I don't think C is strict on how
the clock corresponds to real-world time; at least it wasn't before
C11. And POSIX time explicitly excludes leap seconds.

> One could think of using CLOCK_MONOTONIC_RAW, instead, or
> CLOCK_BOOTTIME if it exists. But I am not sure if the bootime of the
> machine can be considered as an "epoch" in the sense of the C standard
> or even in plain English.

I think the "implementation defined" "epoch" should actually be
defined as something fixed, not something that varies. And for the
main intended use of the function (passing the timespec to
timedlock/timedwait functions), we need to match the clock the pthread
functions use if we're wrapping the pthread implementations, which
means using CLOCK_REALTIME.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.