Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 18 May 2023 09:29:05 -0400
From: Rich Felker <>
To: 847567161 <>, musl <>
Subject: Re: Re: Question:Why musl call
 a_barrier in __pthread_once?

On Thu, May 18, 2023 at 02:23:06PM +0200, Szabolcs Nagy wrote:
> * 847567161 <> [2023-05-18 10:49:44 +0800]:
> > &gt; There is an alternate algorithm for pthread_once that doesn't require
> > &gt; a barrier in the common case, which I've considered implementing. But
> > &gt; it does need efficient access to thread-local storage. At one time,
> > &gt; this was a kinda bad assumption (especially legacy mips is horribly
> > &gt; slow at TLS) but nowadays it's probably the right choice to make, and
> > &gt; we should check that out again...
> > 
> > 1、Can we move dmb after we get the value of control? like this:
> > 
> > int __pthread_once(pthread_once_t *control, void (*init)(void))
> > {
> >     /* Return immediately if init finished before, but ensure that
> >     * effects of the init routine are visible to the caller. */
> >     if (*(volatile int *)control == 2) {
> >         // a_barrier();
> >         return 0;
> >     }
> writes in init may not be visible when *control==2, without
> the barrier. (there are many explanations on the web why
> double-checked locking is wrong without an acquire barrier,
> that's the same issue if you are interested in the details)
> > 2、Can we use 'ldar' to  instead of dmb here? I see musl
> > already use 'stlxr' in a_sc.  like this:
> > 
> > static inline int load(volatile int *p)
> > {
> > 	int v;
> > 	__asm__ __volatile__ ("ldar %w0,%1" : "=r"(v) : "Q"(*p));
> > 	return v;
> > }
> > 
> > if (load((volatile int *)control) == 2) {
> >     return 0;
> > }
> i think acquire ordering is enough because posix does not
> require pthread_once to synchronize memory, but musl does
> not have an acquire barrier/load, so it uses a_barrier.

POSIX does require this. It's specified where Memory Synchronization
is defined,

    "The pthread_once() function shall synchronize memory for the
    first call in each thread for a given pthread_once_t object."

> it is probably not worth optimizing the memory order since
> we know there is an algorithm that does not need a barrier
> in the common case.

Arguably the above might make the barrier-free algorithm invalid for
pthread_once, but I'm not sure if the lack of "synchronize memory"
property in this case would be observable. It probably is with an
intentional construct trying to observe it. There may be some way to
salvage this with a second thread-local counter to account for
gratuitous extra synchronization needed.

Of course call_once is exempt from any such requirements (also exempt
from cancellation shenanigans) and is probably the optimal thing for
programs to use. If needed we can make call_once have a different,
more optimal implementation than pthread_once.

We should probably also file an issue for POSIX to relax the
requirements on pthread_once here, if they're actually a hindrance to
doing this right.


Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.