Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 9 Jan 2018 12:41:08 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 4/7] separate the fast parts of __lock and
 __unlock into a .h file that may be used by other TU

On Wed, Jan 03, 2018 at 02:17:12PM +0100, Jens Gustedt wrote:
> This provides two interfaces __lock_fast and __unlock_fast that are both
> "static inline" and that should result in a better integration of the
> lock in place. The slow path of the lock algorithm remains centralized,
> here adding the overhead of a function call is not a big deal.
> 
> This must only be used directly by a TU that encapsulates all LOCK and
> UNLOCK calls of a particular lock object.
> ---
>  src/internal/__lock.h | 21 +++++++++++++++++++++
>  src/internal/libc.h   |  1 +
>  src/thread/__lock.c   | 20 +++++---------------
>  3 files changed, 27 insertions(+), 15 deletions(-)
>  create mode 100644 src/internal/__lock.h
> 
> diff --git a/src/internal/__lock.h b/src/internal/__lock.h
> new file mode 100644
> index 00000000..f16d6176
> --- /dev/null
> +++ b/src/internal/__lock.h
> @@ -0,0 +1,21 @@
> +#include "pthread_impl.h"
> +
> +static inline void __lock_fast(volatile int *l)
> +{
> +	extern void __lock_slow(volatile int*, int);
> +	if (!libc.threads_minus_1) return;
> +	/* fast path: INT_MIN for the lock, +1 for the congestion */
> +	int current = a_cas(l, 0, INT_MIN + 1);
> +	if (!current) return;
> +	__lock_slow(l, current);
> +}

I think I'd like to hold off on this one for now until we study the
impact. Burying the libc.threads_minus_1 check inside an external
function was actually something of a trick to prevent the caller from
needing a GOT pointer; on archs where loading the GOT pointer is a
pain, putting the load in the caller may badly un-shrinkwrap the
function.

In practice, that probably doesn't work correctly right now because
too many internal functions are missing hidden attribute, and too many
of the affected functions call public functions (which the compiler
was able to assume didn't go thru PLT when we had the vis.h hack, but
that's off by default now because it was a bad idea). So figuring out
whether this sort of shrinkwrap-assistance is still salvagable or just
needs to be thrown out is an open question..

> +static inline void __unlock_fast(volatile int *l)
> +{
> +	/* Check l[0] to see if we are multi-threaded. */
> +	if (l[0] < 0) {
> +		if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) {
> +			__wake(l, 1, 1);
> +		}
> +	}
> +}

This one probably already makes sense to inline.

> diff --git a/src/internal/libc.h b/src/internal/libc.h
> index 5e145183..a594d0c5 100644
> --- a/src/internal/libc.h
> +++ b/src/internal/libc.h
> @@ -47,6 +47,7 @@ extern size_t __sysinfo ATTR_LIBC_VISIBILITY;
>  extern char *__progname, *__progname_full;
>  
>  /* Designed to avoid any overhead in non-threaded processes */
> +void __lock_slow(volatile int *, int) ATTR_LIBC_VISIBILITY;
>  void __lock(volatile int *) ATTR_LIBC_VISIBILITY;
>  void __unlock(volatile int *) ATTR_LIBC_VISIBILITY;
>  int __lockfile(FILE *) ATTR_LIBC_VISIBILITY;
> diff --git a/src/thread/__lock.c b/src/thread/__lock.c
> index 45557c88..a3d8d4d0 100644
> --- a/src/thread/__lock.c
> +++ b/src/thread/__lock.c
> @@ -1,4 +1,5 @@
>  #include "pthread_impl.h"
> +#include "__lock.h"
>  
>  /* This lock primitive combines a flag (in the sign bit) and a
>   * congestion count (= threads inside the critical section, CS) in a
> @@ -16,12 +17,11 @@
>   * with INT_MIN as a lock flag.
>   */
>  
> -void __lock(volatile int *l)
> +weak_alias(__lock_fast, __lock);
> +weak_alias(__unlock_fast, __unlock);
> +
> +void __lock_slow(volatile int *l, int current)
>  {
> -	if (!libc.threads_minus_1) return;
> -	/* fast path: INT_MIN for the lock, +1 for the congestion */
> -	int current = a_cas(l, 0, INT_MIN + 1);
> -	if (!current) return;
>  	/* A first spin loop, for medium congestion. */
>  	for (unsigned i = 0; i < 10; ++i) {
>  		if (current < 0) current -= INT_MIN + 1;
> @@ -48,13 +48,3 @@ void __lock(volatile int *l)
>  		current = val;
>  	}
>  }
> -
> -void __unlock(volatile int *l)
> -{
> -	/* Check l[0] to see if we are multi-threaded. */
> -	if (l[0] < 0) {
> -		if (a_fetch_add(l, -(INT_MIN + 1)) != (INT_MIN + 1)) {
> -			__wake(l, 1, 1);
> -		}
> -	}
> -}
> -- 
> 2.15.1

I'm missing the mechanism by which the new __lock_fast and
__unlock_fast ever get inlined. __lock.h does not seem to be included
anywhere. In order to make this work right, some refactoring
discussion of what should be exposed by libc.h vs pthread_impl.h
probably needs to be discussed. I think the factoring right now is
fairly wrong; lock and futex primitives aren't really part of the
"pthread implementation" but general stuff libc needs internally.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.