musl - Re: [PATCH v3] MT-fork (series)

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20201113033728.GU534@brightrain.aerifal.cx>
Date: Thu, 12 Nov 2020 22:37:28 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH v3] MT-fork (series)

Thanks for the detailed review! Replies inline below:

On Fri, Nov 13, 2020 at 12:26:55AM +0100, Szabolcs Nagy wrote:
> > +static ssize_t read_loop(int fd, void *p, size_t n)
> > +{
> > +	for (size_t i=0; i<n; ) {
> > +		ssize_t l = read(fd, (char *)p+i, n-i);
> > +		if (l<0) {
> > +			if (errno==EINTR) continue;
> > +			else return -1;
> > +		}
> > +		if (l==0) return i;
> > +		i += l;
> > +	}
> > +	return n;
> > +}
> > +
> >  static void *mmap_fixed(void *p, size_t n, int prot, int flags, int fd, off_t off)
> >  {
> >  	static int no_map_fixed;
> > @@ -1060,13 +1073,17 @@ static struct dso *load_library(const char *name, struct dso *needed_by)
> >  				snprintf(etc_ldso_path, sizeof etc_ldso_path,
> >  					"%.*s/etc/ld-musl-" LDSO_ARCH ".path",
> >  					(int)prefix_len, prefix);
> > -				FILE *f = fopen(etc_ldso_path, "rbe");
> > -				if (f) {
> > -					if (getdelim(&sys_path, (size_t[1]){0}, 0, f) <= 0) {
> > +				fd = open(etc_ldso_path, O_RDONLY|O_CLOEXEC);
> > +				if (fd>=0) {
> > +					size_t n = 0;
> > +					if (!fstat(fd, &st)) n = st.st_size;
> > +					if ((sys_path = malloc(n+1)))
> > +						sys_path[n] = 0;
> > +					if (!sys_path || read_loop(fd, sys_path, n)<0) {
> 
> should this handle the short read case?
> 
> i assume we only want to support atomic updates to
> the path file so there should not be a short read,
> but i think rejecting read_loop(,,n)!=n is safer.

We could reject != n, but it's possible to have races where you read a
partial file even when the return value is n (because fstat only
returned the size of the partial file). In fact the only way you'd get
a return value <n is if the file were truncated after the fstat;
normally you'd see the opposite, file growing after fstat. I'm not
opposed to changing it but I don't think it gives any practical
improvement to safety. The only actually-safe way to change this file
is by atomic replacement.

> > diff --git a/src/malloc/lite_malloc.c b/src/malloc/lite_malloc.c
> > index f8931ba5..0f461617 100644
> > --- a/src/malloc/lite_malloc.c
> > +++ b/src/malloc/lite_malloc.c
> > @@ -100,4 +100,16 @@ static void *__simple_malloc(size_t n)
> >  	return p;
> >  }
> >  
> > -weak_alias(__simple_malloc, malloc);
> > +weak_alias(__simple_malloc, __libc_malloc_impl);
> > +
> > +void *__libc_malloc(size_t n)
> > +{
> > +	return __libc_malloc_impl(n);
> > +}
> > +
> > +static void *default_malloc(size_t n)
> > +{
> > +	return __libc_malloc_impl(n);
> > +}
> > +
> > +weak_alias(default_malloc, malloc);
> 
> 
> maybe i'm missing something but i thought it would be enough to do
> 
> weak_alias(__simple_malloc, __libc_malloc);
> static void *default_malloc(size_t n)
> {
> 	return __libc_malloc(n);
> }
> weak_alias(default_malloc, malloc);
> 
> here and have strong __libc_malloc symbol in the malloc implementation.

That's what I did at first, and it works, but it keeps the property of
depending on order of files in the archive, which was undesirable.
Previously I left it alone because it avoided jumping thru an
additional entry point, but now it makes no difference for the public
malloc, and the libc-internal one is not at all performance-relevant.
So we're just wasting a few bytes of size here to avoid depending on
the link order hack, and I think that's a reasonable trade.

> > diff --git a/ldso/dynlink.c b/ldso/dynlink.c
> > index 61714f40..6b868c84 100644
> > --- a/ldso/dynlink.c
> > +++ b/ldso/dynlink.c
> > @@ -20,6 +20,7 @@
> >  #include <semaphore.h>
> >  #include <sys/membarrier.h>
> >  #include "pthread_impl.h"
> > +#include "fork_impl.h"
> >  #include "libc.h"
> >  #include "dynlink.h"
> >  
> > @@ -1426,6 +1427,17 @@ void __libc_exit_fini()
> >  	}
> >  }
> >  
> > +void __ldso_atfork(int who)
> > +{
> > +	if (who<0) {
> > +		pthread_rwlock_wrlock(&lock);
> > +		pthread_mutex_lock(&init_fini_lock);
> > +	} else {
> > +		pthread_mutex_unlock(&init_fini_lock);
> > +		pthread_rwlock_unlock(&lock);
> > +	}
> > +}
> > +
> >  static struct dso **queue_ctors(struct dso *dso)
> >  {
> >  	size_t cnt, qpos, spos, i;
> > @@ -1484,6 +1496,13 @@ static struct dso **queue_ctors(struct dso *dso)
> >  	}
> >  	queue[qpos] = 0;
> >  	for (i=0; i<qpos; i++) queue[i]->mark = 0;
> > +	for (i=0; i<qpos; i++)
> > +		if (queue[i]->ctor_visitor && queue[i]->ctor_visitor->tid < 0) {
> > +			error("State of %s is inconsistent due to multithreaded fork\n",
> > +				queue[i]->name);
> > +			free(queue);
> > +			if (runtime) longjmp(*rtld_fail, 1);
> > +		}
> 
> 
> hm since fork takes the init_fini_lock i guess
> the ctors could be finished in the child if
> necessary. or is there some problem with that?
> 
> otherwise the patch looks good to me.

The init_fini_lock is not held across running of ctors/dtors, only to
protect access to the queue. Otherwise it would break concurrent
dlopen, recursive dlopen, etc. (which can be delayed arbitrarily long
by ctor execution). In fact if the lock worked like that and fork took
the lock, fork would also be delayed arbitrarily long and you could
not fork from ctors.

The underlying issue this code is addressing is that, if a ctor was
running in another thread at the time of fork, it will never finish,
but it also can't be restarted because then parts of it would be
executed twice. We discussed this on IRC before and the only
conclusion I could come it was that the DSO is permanently unusable.

Maybe it would make sense to strip the affected DSO and all that
depend on it from the loaded list, so they'd be reloaded from the
filesystem next time they're dlopened (with the partly loaded old
mapping just abandoned). I'm not sure how bad that would be in terms
of unwanted side effects.

Note however that, with the way I implemented it here, you can avoid
the "perma-dead DSO" problem just by taking a lock around your own
calls to dlopen, and installing an atfork handler that takes that
lock. This is basically "opting into" the above "fork delayed
arbitrarily long" behavior I mentioned.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.