Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 8 Aug 2014 11:38:57 +0300
From: Timo Teras <timo.teras@....fi>
To: Rich Felker <dalias@...c.org>
Cc: musl@...ts.openwall.com
Subject: Re: [PATCH] private futex support

On Thu, 26 Jun 2014 14:48:03 -0400
Rich Felker <dalias@...c.org> wrote:

> Here is a semi-proposed patch for private futex support, one of the
> items on the roadmap for the next release cycle. However I've been
> unable to demonstrate it yielding better performance than non-private
> futex, and since it adds (small, but nonzero) complexity and worse
> behavior on outdated (pre-2.6.22) kernels, I'm a bit hesitant to

Seems the patch is not fully complete. At least pthread_cond_t is not
fully using private-futexes yet.

Before testing, I did add the private flag in:

1) pthread_cond_broadcast.c for the FUTEX_REQUEUE call (can likely be
unconditionally added, as pshared conditions have separate code path)

2) pthread_cond_timedwait.c for the __timedwait call (though this
probably needs to be based on the pshared attribute)

My changes being:

diff --git a/src/thread/pthread_cond_broadcast.c b/src/thread/pthread_cond_broadcast.c
index 0901daf..4327a0e 100644
--- a/src/thread/pthread_cond_broadcast.c
+++ b/src/thread/pthread_cond_broadcast.c
@@ -27,7 +27,7 @@ int pthread_cond_broadcast(pthread_cond_t *c)
 
 	/* Perform the futex requeue, waking one waiter unless we know
 	 * that the calling thread holds the mutex. */
-	__syscall(SYS_futex, &c->_c_seq, FUTEX_REQUEUE,
+	__syscall(SYS_futex, &c->_c_seq, 128 | FUTEX_REQUEUE,
 		!m->_m_type || (m->_m_lock&INT_MAX)!=__pthread_self()->tid,
 		INT_MAX, &m->_m_lock);
 
diff --git a/src/thread/pthread_cond_timedwait.c b/src/thread/pthread_cond_timedwait.c
index 99d62cc..73c1781 100644
--- a/src/thread/pthread_cond_timedwait.c
+++ b/src/thread/pthread_cond_timedwait.c
@@ -64,7 +64,7 @@ int pthread_cond_timedwait(pthread_cond_t *restrict c, pthread_mutex_t *restrict
 
 	pthread_mutex_unlock(m);
 
-	do e = __timedwait(&c->_c_seq, seq, c->_c_clock, ts, cleanup, &cm, 0);
+	do e = __timedwait(&c->_c_seq, seq, c->_c_clock, ts, cleanup, &cm, 1);
 	while (c->_c_seq == seq && (!e || e==EINTR));
 	if (e == EINTR) e = 0;
 
It seems that things worked even without these changes, but everything
was a lot slower. Not sure why. But the code sounds buggy anyway,
because FUTEX_REQUEUE was transferring waiters to &m->_m_lock which was
already treated as private address in other futex calls causing a
mismatch.

> actually commit it. If anyone is interested in this feature, please
> see if you can find some examples that demonstrate that it measurably
> improves performance.

And running my simple test-case of having two threads wake up each
other using a condition variable, seems to yield noticeable performance
speed up from private futexes. See at the end of mail for the code.

The low and high numbers from few test runs are as follows from musl 
git 4fe57cad709fdfb377060 without and with the futex patch are as
follows:

~/privfutex $ time ~/oss/musl/lib/libc.so  ./test
count=2516417
real	0m 2.00s
user	0m 1.68s
sys	0m 2.30s

~/privfutex $ time ~/oss/musl/lib/libc.so  ./test
count=2679381
real	0m 2.00s
user	0m 1.59s
sys	0m 2.39s

Private futexes:

~/privfutex $ time ~/oss/musl/lib/libc.so  ./test
count=3839470
real	0m 2.00s
user	0m 1.68s
sys	0m 1.98s

~/privfutex $ time ~/oss/musl/lib/libc.so  ./test
count=5350852
real	0m 2.00s
user	0m 1.66s
sys	0m 2.32s


You can see essentially lowered sys time use, and up to doubled
throughput of wait/wake operations.

So I suspect your test case was not measuring right thing. Private
futexes speed up only specific loads, and this type of pthread_cond_t
usage would probably be the pattern benefiting most.

Please reconsidering adding this after addressing the found
deficiencies stated in the beginning.

Thanks,
Timo

And the test case code:

#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

static pthread_mutex_t mtx = PTHREAD_MUTEX_INITIALIZER;
static pthread_cond_t cond = PTHREAD_COND_INITIALIZER;
static volatile long run = 1, count = 0;

static void *func(void *arg)
{
	long my_id = (long) arg;

	do {
		pthread_mutex_lock(&mtx);
		while (run == my_id)
			pthread_cond_wait(&cond, &mtx);
		count++;
		if (run) run = (my_id == 1) ? 2 : 1;
		pthread_cond_broadcast(&cond);
		pthread_mutex_unlock(&mtx);
	} while (run);

	return NULL;
}

int main(void)
{
	void *ret;
	pthread_t t1, t2;

	pthread_create(&t1, NULL, func, (void*) 1);
	pthread_create(&t2, NULL, func, (void*) 2);

	sleep(2);

	pthread_mutex_lock(&mtx);
	run = 0;
	pthread_cond_broadcast(&cond);
	pthread_mutex_unlock(&mtx);

	pthread_join(t1, &ret);
	pthread_join(t2, &ret);

	printf("count=%ld\n", count);

	return 0;
}

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.