musl - Re: [PATCH] replace a mfence instruction by an xchg instruction

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150816155807.GP31018@brightrain.aerifal.cx>
Date: Sun, 16 Aug 2015 11:58:07 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] replace a mfence instruction by an xchg
 instruction

On Sun, Aug 16, 2015 at 05:50:21PM +0200, Jens Gustedt wrote:
> > See page 330, http://www.intel.com/Assets/en_US/PDF/manual/253668.pdf
> > 
> > So mfence seems to be weaker than lock-prefixed instructions in terms
> > of the ordering it imposes (lock-prefixed instructions forbid
> > reordering and also have a total ordering across all cores).
> 
> Yes, it says so on page 8-26 that the fences are definitively not
> serializing instructions.
> 
> (But what I tried to show in my previous mail still holds, the
> instruction latency itself plays a big part in the efficiency of these
> instructions.)

I wasn't trying to contradict anything you've said, just expressing
the absurdity of mfence being slower than lock-prefixed instructions,
since it's a strictly-weaker operation.

> I read all of that as:
> 
>  - mfence can be used to achieve acq_rel ordering
>  - none of the fences can be use to achieve seq_cst ordering

By this you mean that only lock-prefixed instructions impose a total
order across all cores?

> Wasn't the idea that all atomic.h functions implement sequential
> consistency?

Yes, that's the intent, but I don't want to introduce 'major'
performance regressions fixing 'minor' failures to be seq_cst if
there's no observable misbehavior in the code using them. Still it
would be nice to know whether such failures still exist, and if so
where, so we can eventually clean this up.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.