Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 30 Jul 2015 14:37:13 +0300 (MSK)
From: Alexander Monakov <amonakov@...ras.ru>
To: musl@...ts.openwall.com
Subject: Re: New optimized normal-type mutex?

On Thu, 30 Jul 2015, Jens Gustedt wrote:
> Am Donnerstag, den 30.07.2015, 12:36 +0300 schrieb Alexander Monakov:
> > That sounds like your testcase simulates a load where you'd be better off with
> > a spinlock in the first place, no?
> 
> Hm, this is not a "testcase" in the sense that this is the real code
> that I'd like to use for the generic atomic lock-full stuff. My test
> is just using this atomic lock-full thing, with a lot of threads that
> use the same head of a "lock-free" FIFO implementation. There the
> inner part in the critical section is just memcpy of some bytes. For
> reasonable uses of atomics this should be about 16 to 32 bytes that
> are copied.
> 
> So this is really a use case that I consider important, and that I
> would like to see implemented with similar performance.

I acknowledge that that seems like an important case, but you have not
addressed my main point.  With so little work in the critical section, it does
not make sense to me that you would use something like a normal-type futex-y
mutex.  Even a call/return to grab it gives you some overhead.  I'd expect you
would use a fully inlined spinlock acquisition/release around the memory copy.

> 
> (I didn't yet think of making this into a fullfledged mutex,
> implementing timed versions certainly needs some thinking.)
> 
> > Have you tried simulating a load that does some non-trivial work between
> > lock/unlock, making a spinlock a poor fit?
> 
> No. But I am not sure that there is such a case :)

There appears to be some miscommunication here, and the smiley does not help.
"such a case" would be copying 32KB in the critical section, for example.
 
> With this idea that the counter doesn't change once the thread is
> inside the lock-acquisition loop, there is much less noise on the lock
> value. This has two benefits. First the accesses in the loop are
> mainly reads, to see if there has been a change, no writes. So the bus
> pressure should be reduced. And second, because there are less writes
> in total, other threads that are inside the same loop perceive less
> perturbation, and the futex as a good chance to succeed.

I think spinning every time you're about to enter futex_wait helps if you
expect critical sections to be as small as your spin period.  Otherwise, it's
not obviously an improvement.  I think normally you spin prior to the very
first atomic operation, in anticipation that you can proceed via the fast
path.  Your spin scheme is different.

Alexander

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.