Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Mon, 21 Sep 2020 09:09:27 -0500
From: <sidneym@...eaurora.org>
To: "'Rich Felker'" <dalias@...c.org>
Cc: <musl@...ts.openwall.com>
Subject: RE: Hexagon DSP support



> -----Original Message-----
> From: 'Rich Felker' <dalias@...c.org>
> Sent: Sunday, September 20, 2020 12:17 PM
> To: sidneym@...eaurora.org
> Cc: musl@...ts.openwall.com
> Subject: Re: [musl] Hexagon DSP support
> 
> On Sun, Sep 20, 2020 at 08:12:47AM -0500, sidneym@...eaurora.org wrote:
> > > > > > [...]
> > > > > > +#define a_barrier a_barrier
> > > > > > +static inline void a_barrier() {
> > > > > > +	__asm__ __volatile__ ("barrier" ::: "memory"); }
> > > > >
> > > > > Is the barrier implied in memw_locked? If not, there need to be
> > > > > explicit barriers in all the atomic functions.
> > > >
> > > > Yes, if there is any memory access on the reserved address the
> > > > reservation is lost and the predicate is false.
> > >
> > > That's not what a barrier means. The question is whether it orders
> > > all
> > access
> > > to *other* memory, not the address with the reservation on it.
> > > In other words, musl's a_*() atomics need to be full seq_cst model
> > > operations, not relaxed atomics.
> >
> > Per our spec:
> > "Threads in the Hexagon processor follow a sequentially consistent
> > memory model at a packet granularity. Threads interleave their memory
> > operations with one another in an arbitrary but fair manner. This
> > results in a consistent program order that is globally observable by
> > all threads in the same order."
> 
> Can you clarify or provide a reference for what 'packet granularity'
> means? If there's actually a full builtin seq_cst order I don't see what
the
> barrier instruction exists for to begin with.
> 

Packet granularity is like instruction granularity, every operation within
the packet happens in parallel.  There is an exception since packets can
have dual stores.  They happen in a prescribed order.  A packet has 4 slots
but dual stores must be in slots 0 & 1.  Stores in slot 1 happen before
stores in slot 0.  Slot 0  is the highest address in the packet so the store
order would appear as it would if you disassembled the code.

barrier is used for thread-to-external-memory.  All observers in the "global
shared domain" would see the store after the barrier finished.
"For devices external to the Hexagon processor, the processor follows a
weakly-ordered
memory model. Explicit synchronization is required to ensure order between
memory
accesses."


> Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.