musl - Re: C threads, v. 6.2

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20140828200029.GA12888@brightrain.aerifal.cx>
Date: Thu, 28 Aug 2014 16:00:29 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: C threads, v. 6.2

On Thu, Aug 28, 2014 at 09:28:09PM +0200, Jens Gustedt wrote:
> Hello,
> 
> Am Donnerstag, den 28.08.2014, 12:15 -0400 schrieb Rich Felker:
> > > Am Mittwoch, den 27.08.2014, 19:46 -0400 schrieb Rich Felker:
> > > > On Thu, Aug 28, 2014 at 12:11:45AM +0200, Jens Gustedt wrote:
> > > > > diff --git a/arch/arm/bits/alltypes.h.in b/arch/arm/bits/alltypes.h..in
> > > > > index 183c4c4..333d43f 100644
> > > > > --- a/arch/arm/bits/alltypes.h.in
> > > > > +++ b/arch/arm/bits/alltypes.h.in
> > > > > @@ -19,7 +19,7 @@ TYPEDEF long time_t;
> > > > >  TYPEDEF long suseconds_t;
> > > > >  
> > > > >  TYPEDEF struct { union { int __i[9]; unsigned __s[9]; } __u; } pthread_attr_t;
> > > > > -TYPEDEF struct { union { int __i[6]; volatile void *volatile __p[6]; } __u; } pthread_mutex_t;
> > > > > -TYPEDEF struct { union { int __i[12]; void *__p[12]; } __u; } pthread_cond_t;
> > > > > +TYPEDEF struct { union { int __i[6]; volatile void *volatile __p[6]; } __u; } __pthread_mutex_t;
> > > > > +TYPEDEF struct { union { int __i[12]; void *__p[12]; } __u; } __pthread_cond_t;
> > > > > [...]
> > > > > +TYPEDEF __pthread_cond_t pthread_cond_t;
> > > > > +TYPEDEF __pthread_cond_t cnd_t;
> > > > > +TYPEDEF __pthread_mutex_t pthread_mutex_t;
> > > > > +TYPEDEF __pthread_mutex_t mtx_t;
> > > > 
> > > > This changes the C++ ABI by changing the tag for the POSIX types, so
> > > > it can't be done this way. We could return to some serious language
> > > > lawyering about whether it's valid to access members via the 'wrong'
> > > > struct type when we have two identical struct types, or just assume
> > > > that's safe for now and do it.
> > > 
> > > I don't see your point, this doesn't change the tag name, no? (There
> > > is no tag name, here.) This is just typedeffing to the same type as
> > > before so this should be "as allowed" as it was before.
> > 
> > That's not how it works. In C++, there is no separate namespace for
> > struct tags versus types. If you define a struct without a tag but
> > using typedef, the typedef name becomes its effective struct tag,
> > which is what gets used for name mangling. If you later use typedef
> > again to make an alias for this type, it does not make a new struct
> > type with a new tag, just an alias for the existing one. Thus, your
> > patch as written changes the struct tags for C++ ABI from
> > "pthread_mutex_t" and "pthread_cond_t" to "__pthread_mutex_t" and
> > "__pthread_cond_t".
> 
> argh
> 
> at least it doesn't matter for the standard functions (they are `extern
> "C"`) but only for user functions with C++ interfaces.

Right, but it matters for all C++ code containing C++ functions that
use pthread_mutex_t* as an argument. And apparently there's a lot of
such code.

> Well, ok, so if you could come up with some better idea in the future,
> let me know.

I'm not even sure it's an issue. I've seen it argued that aliasing
rules don't even apply here because, when you access something like
m->_m_lock, that's not an "access" to the structure object/type but to
the individual member. If that's true, then as long as the structs
have identical layout, it should be valid to access the members via
either.

Also, what is the relationship between two identical struct or union
types without tags (i.e. the first member of pthread_mutex_t and the
first member of mtx_t, both of which are unions with no tag)?

> > > > > +#ifdef __GNUC__
> > > > > +#define _Nonnull(...) __attribute__((__nonnull__(__VA_ARGS__)))
> > > > > +#else
> > > > > +#define _Nonnull(...)
> > > > > +#endif
> > > > 
> > > > If we want to use attributes like this, it's a separate change from
> > > > adding threads. But I'm fairly much against doing it anyway since the
> > > > compiler already knows how to make these warnings for standard
> > > > functions, without help from the headers.
> > > 
> > > There will be a while, until compilers know these things for C11
> > > thread functions, I think.
> > > 
> > > BTw, I was tempted to do it the "C" way by using things like
> > > `[static 1]` instead of `*`, but then I thought you wouldn't like
> > > that.
> > 
> > Actually I like it much better, but I'm not sufficiently familiar with
> > it to know whether it's a conforming declaration. Is it formally
> > compatible with the standard prototype?
> 
> yes, C allows even for crufty things like
> 
> void f(int a[static volatile const 42]);
> 
> to be equivalent to
> 
> void f(int *volatile const a);
> 
> as a prototype. Interfaces can contain more information than just the
> prototype.

I was asking whether the use of static to mean "pointer to at least
this many elements" used for an argument in a function type resulted
in a distinct function type from the same without static.

> > If so, how does it allow the compiler to enforce anything? 
> 
> It doesn't enforce anything, it is more a hint at the same level as
> `restrict` or `register`. It is a prerequisite that the user of the
> function has to ensure.

So the following is NOT UB?

void foo(int [static 2]);

void bar()
{
	int x;
	foo(&x);
}

I'm guessing not. The same seems to apply to restrict too -- it has no
meaning on the caller side, so the compiler cannot automatically
assume UB when the restrict constraint is violated by the caller. It
can only be used when optimizing on the callee side. I suppose static
is the same -- but in the case of static, the only optimization that
seems to be possible on the callee side is assuming the pointer is
non-null; the length seems irrelevant.

> As nsz remarked in his reply, most older compilers don't do anything
> with it, they just ignore it, though the existence of the nonnull
> attribute shows that they would be easily capable of doing so.
> 
> The other inconvenience for `static 1` is C++. They haven't adopted
> it, so as such this would make the headers incompatible with C++. So
> also for this one we would need some preprocessor magic.
> 
> And then, also, it is ugly :(

Yes. Then let's just omit it for now.

> > > > > diff --git a/src/thread/mtx_unlock.c b/src/thread/mtx_unlock.c
> > > > > new file mode 100644
> > > > > index 0000000..d1721a6
> > > > > --- /dev/null
> > > > > +++ b/src/thread/mtx_unlock.c
> > > > > @@ -0,0 +1,11 @@
> > > > > +#include "pthread_impl.h"
> > > > > +#include <threads.h>
> > > > > +
> > > > > +int __pthread_mutex_unlock(pthread_mutex_t *);
> > > > > +
> > > > > +int (mtx_unlock)(mtx_t *mtx) {
> > > > > +	/* In the best of all worlds this is a tail call. */
> > > > > +	int ret = __pthread_mutex_unlock(mtx);
> > > > > +	/* In case of UB may also return EPERM. */
> > > > > +	return ret ? thrd_error : thrd_success;
> > > > > +}
> > > > 
> > > > For unlock, C11 makes it UB to unlock a mutex you don't own (even for
> > > > recursive type), so the tail call would be legal. I don't have a
> > > > strong opinion either way here.
> > > 
> > > I'll remove the comment, this is a left over. Tail call is still not
> > > possible, since moraly thrd_success is not 0 :)
> > > 
> > > But we could just return thrd_success unconditionnally.
> > 
> > Or: thrd_success ? thrd_success : ret;
> 
> ok, I'll go for that

OK.

> > > > > diff --git a/src/thread/pthread_setspecific.c b/src/thread/pthread_setspecific.c
> > > > > index 55e46a8..dd18b96 100644
> > > > > --- a/src/thread/pthread_setspecific.c
> > > > > +++ b/src/thread/pthread_setspecific.c
> > > > > @@ -1,6 +1,6 @@
> > > > >  #include "pthread_impl.h"
> > > > >  
> > > > > -int pthread_setspecific(pthread_key_t k, const void *x)
> > > > > +int __pthread_setspecific(pthread_key_t k, const void *x)
> > > > >  {
> > > > >  	struct pthread *self = __pthread_self();
> > > > >  	/* Avoid unnecessary COW */
> > > > > @@ -10,3 +10,5 @@ int pthread_setspecific(pthread_key_t k, const void *x)
> > > > >  	}
> > > > >  	return 0;
> > > > >  }
> > > > > +
> > > > > +weak_alias(__pthread_setspecific, pthread_setspecific);
> > > > 
> > > > It looks like you have a duplicate tss_set for this rather than using
> > > > the above, so either the above is a grauitous change, or the new
> > > > tss_set is duplicate code that should be removed. Or did I miss
> > > > something?
> > > 
> > > you are right, this should be omitted
> > > 
> > > (the tss_get code differs in the const qualification of the second argument)
> > 
> > In this case it's still possible to implement tss_set as a wrapper,
> 
> I am really allergic against casts, even more if they are somewhat
> hidden. So I'd do much to avoid that.

There's no cast here. void * converts implicitly to const void *.

> > > > Didn't we discuss before just passing a special attr pointer to
> > > > __pthread_create to effect C11-style creation?
> > > 
> > > I did that, and that version could still be revived. But I was not too
> > > happy with it, because it meant intruding into the existing pthread
> > > code. For this version, the pthread code is just reorganized in
> > > different TU, but otherwise there should be no change.
> > 
> > From a maintainability standpoint, having two subtly different
> > versions of the same complex code is a big cost. I wouldn't be
> > fundamentally opposed to a factoring that avoids this duplication
> > while allowing the POSIX-specific parts to be optimized out for pure
> > C11 use, but I think that would be a lot more invasive (mainly in
> > terms of potential for errors, possibly also for performance, but
> > unlikely since basically all the time is spent in mmap, mprotect, and
> > clone anyway) and would be something to look into at a later time, not
> > while trying to get the initial work on C11 threads in.
> > 
> > > Also I'd really like to advertise C threads as a lightweighted,
> > > stripped down thread implementation, so the short cuts (that are
> > > merely a fall off of the code separation, here) were politically
> > > welcome.
> > 
> > I don't think saving 200 bytes in a pure-C11 static-linked program
> > while adding 300 bytes to libc.so in code duplication (these numbers
> > are just guesses but sound about right) really makes it come across as
> > more lightweight.
> 
> I know. I'll have a look and try to factor these things out, such that
> we really can weigh the alternatives.

Can we look at this as a potential post-merge task? I'm skeptical that
it improves anything; saving maybe 100-200 bytes in the static-linked
C11-only case is probably not worth spreading code out over multiple
functions or files and making the flow of pthread_create less obvious.
I'm willing to look at it if you want to try anyway, but I don't think
it should be holding up getting C11 threads support added.

> > > > You already moved pthread_detach to the protected namespace, so this
> > > > seems to be duplicate code.
> > > 
> > > It is.
> > > 
> > > > And it's a place that has implementation
> > > > details (magic numbers like 2, which you could argue shouldn't exist
> > > > ;-)
> > > 
> > > I would !)
> > > 
> > > > so I think it's best to avoid spreading these details out over
> > > > more places where they could go out of sync, no?
> > > 
> > > yes, probably, the fact that pthread_detach called pthread_join
> > > somehow disturbed me. I'll reconsider
> > 
> > It's not so odd if you think about it -- when the thread to be
> > detached has already exited, you can't (implementation-wise) detach it
> > and leave it responsible for cleaning up itself, because it doesn't
> > exist anymore. Since you know it's exited though, joining and throwing
> > away the result is possible.
> 
> But thinking of it and looking into the patches I discovered my real
> motivation for doing so. (I should have better commented to remember
> things myself.) The thing is that using pthread_join under the hood is
> semantically wrong because of the different interfaces. Well we are
> only passing in a null pointer, here, but still the type is wrong, C
> threads "join" on a int* and not on a void**.

I see. But I don't think it matters. You're already marshalling the
int return value via a cast to void* in thrd_exit and thrd_join, and
in the case of detach, the result is being thrown away anyway. There's
nothing sketchy/UB going on here, while other places already encode
the assumption that void* can faithfully round-trip int values.

> > > BTW, the choice of CLOCK_REALTIME, here, could be subject of
> > > debade. IIRC this clock doesn't exactly give the seconds since Unix
> > > Epoch, but that value minus some leapseconds. So it doesn't seem to
> > > fulfill the C specification.
> > 
> > With any luck, UTC might eventually be fixed to drop leap seconds, and
> > then it won't matter for future times (and having your clock set to a
> > past time is nonsense anyway). But I don't think C is strict on how
> > the clock corresponds to real-world time; at least it wasn't before
> > C11. And POSIX time explicitly excludes leap seconds.
> > 
> > > One could think of using CLOCK_MONOTONIC_RAW, instead, or
> > > CLOCK_BOOTTIME if it exists. But I am not sure if the bootime of the
> > > machine can be considered as an "epoch" in the sense of the C standard
> > > or even in plain English.
> > 
> > I think the "implementation defined" "epoch" should actually be
> > defined as something fixed, not something that varies. And for the
> > main intended use of the function (passing the timespec to
> > timedlock/timedwait functions), we need to match the clock the pthread
> > functions use if we're wrapping the pthread implementations, which
> > means using CLOCK_REALTIME.
> 
> yes, right
> 
> I found another glitch with timespec_get, though. It shouldn't touch
> errno. So either we do some stiches around it to save errno and

Why? I see nowhere in C11 that it's specified not to touch errno. In
general, any standard library function can clobber errno as long as it
does not set it to 0, unless it's specifically documented not to
modify errno on success.

> restore it (for a case that probably never happens) or implement it
> directly from the vdso and the syscall. I already have done the later,
> and it looks sufficiently simple to be acceptable and not too much
> code duplication.

The latter is some fairly major code duplication, especially since we
have fallbacks for pre-SYS_clock_gettime kernels. Eventually I plan to
add emulation/fallback for at least CLOCK_MONOTONIC and
CLOCK_CPUTIME_CLOCKID too, and if we want to later add support for
more clock bases to timespec_get, using the same __clock_gettime
backend allows the fallback code to work there too. So I think it's
best not to try to make the vdsocall/syscall directly.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.