Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 18 May 2023 10:15:36 -0400
From: Jeffrey Walton <noloader@...il.com>
To: musl@...ts.openwall.com
Cc: 847567161 <847567161@...com>
Subject: Re: Re: Question:Why musl call a_barrier in __pthread_once?

On Thu, May 18, 2023 at 9:29 AM Rich Felker <dalias@...c.org> wrote:
>
> On Thu, May 18, 2023 at 02:23:06PM +0200, Szabolcs Nagy wrote:
> > * 847567161 <847567161@...com> [2023-05-18 10:49:44 +0800]:
> > > &gt; There is an alternate algorithm for pthread_once that doesn't require
> > > &gt; a barrier in the common case, which I've considered implementing. But
> > > &gt; it does need efficient access to thread-local storage. At one time,
> > > &gt; this was a kinda bad assumption (especially legacy mips is horribly
> > > &gt; slow at TLS) but nowadays it's probably the right choice to make, and
> > > &gt; we should check that out again...
> > >
> > > 1、Can we move dmb after we get the value of control? like this:
> > >
> > > int __pthread_once(pthread_once_t *control, void (*init)(void))
> > > {
> > >     /* Return immediately if init finished before, but ensure that
> > >     * effects of the init routine are visible to the caller. */
> > >     if (*(volatile int *)control == 2) {
> > >         // a_barrier();
> > >         return 0;
> > >     }
> >
> > writes in init may not be visible when *control==2, without
> > the barrier. (there are many explanations on the web why
> > double-checked locking is wrong without an acquire barrier,
> > that's the same issue if you are interested in the details)
> >
> > > 2、Can we use 'ldar' to  instead of dmb here? I see musl
> > > already use 'stlxr' in a_sc.  like this:
> > >
> > > static inline int load(volatile int *p)
> > > {
> > >     int v;
> > >     __asm__ __volatile__ ("ldar %w0,%1" : "=r"(v) : "Q"(*p));
> > >     return v;
> > > }
> > >
> > > if (load((volatile int *)control) == 2) {
> > >     return 0;
> > > }
> >
> > i think acquire ordering is enough because posix does not
> > require pthread_once to synchronize memory, but musl does
> > not have an acquire barrier/load, so it uses a_barrier.
>
> POSIX does require this. It's specified where Memory Synchronization
> is defined,
> https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap04.html#tag_04_12
>
>     "The pthread_once() function shall synchronize memory for the
>     first call in each thread for a given pthread_once_t object."
>
> > it is probably not worth optimizing the memory order since
> > we know there is an algorithm that does not need a barrier
> > in the common case.
>
> Arguably the above might make the barrier-free algorithm invalid for
> pthread_once, but I'm not sure if the lack of "synchronize memory"
> property in this case would be observable. It probably is with an
> intentional construct trying to observe it. There may be some way to
> salvage this with a second thread-local counter to account for
> gratuitous extra synchronization needed.
>
> Of course call_once is exempt from any such requirements (also exempt
> from cancellation shenanigans) and is probably the optimal thing for
> programs to use. If needed we can make call_once have a different,
> more optimal implementation than pthread_once.

Be careful of call_once.

Several years ago I cut over to C++11's call_once. The problem was, it
only worked reliably on 32-bit and 64-bit Intel platforms. It was a
disaster on Aarch64, PowerPC and Sparc. I had to back it out.

The problems happened back when GCC 6 and 7 were popular. The problem
was due to something sideways in glibc.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146

If you want a call_once-like initialization then rely on N2660:
Dynamic Initialization and Destruction with Concurrency.

Jeff

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.