musl - Re: Deduplicating atomics written in terms of CAS

Follow @Openwall on Twitter for new release announcements and other news

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150517162854.GN17573@brightrain.aerifal.cx>
Date: Sun, 17 May 2015 12:28:54 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: Deduplicating atomics written in terms of CAS

On Sun, May 17, 2015 at 09:37:19AM +0200, Jens Gustedt wrote:
> Am Sonntag, den 17.05.2015, 02:14 -0400 schrieb Rich Felker:
> > - a_and_64/a_or_64 (malloc only; these are misnamed too)
> 
> I should have checked the use before my last mail. They are
> definitively misnamed.
> 
> Both uses of them look ok concerning atomicity, only one of the a_and
> or a_or calls triggers.
> 
> The only object (mal.binmap) to which this is applied is in fact
> volatile, so it must actually be reloaded all the time it is used.
> 
> But in line 352 the code uses another assumption, then, that 64 bit
> loads always are atomic. I don't see why this should hold in general.

I don't think there's such an assumption. The only assumption is that
each bit is read exactly the number of times it would be on the
abstract machine, so that we can't observe inconsistent values for the
same object. Lack of any heavy synchronization around reading the mask
may result in failure to see some changes or seeing them out of order,
but it doesn't matter: If a bin is wrongly seen as non-empty, locking
and attempting to unbin from it will fail. If it is wrongly seen as
empty, the worst that can happen is a less-optimal (but would have
been optimal an instant earlier) larger chunk gets split instead of
using a smaller one to satisfy the allocation.

Of course it's an open question whether the complex atomics and
fine-grained locking in malloc help or hurt performance more on
average. I'd really like to measure this at some point. Overhauling
malloc to try to get significantly better multi-threaded performance
without the fragmentation-optimality sacrifices other mallocs make is
a long-term goal I have open.

> We already have a similar assumption for 32 bit int all over the
> place, and I am not too happy with such "silent" assumption. For 64
> bit, this assumption looks wrong to me.

I agree I wouldn't be happy with such an assumption, but I don't think
it's being made here.

> I would be much happier by using explicit atomic types and atomic load
> functions or macros everywhere. For normal builds these could be dummy
> types made to resolve to the actual code that we have, now. But this
> would allow to have hardening builds, that check for consistency of
> all atomic accesses.

There is no way to do an atomic 64-bit load on most of the archs we
support. So trying to make it explicit wouldn't help.

Rich

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.