Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date: Tue, 18 Sep 2018 15:23:30 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH 5/7] use the new lock algorithm for malloc

On Tue, Jan 09, 2018 at 02:26:44PM -0500, Rich Felker wrote:
> On Tue, Jan 09, 2018 at 07:58:51PM +0100, Jens Gustedt wrote:
> > Hello Rich,
> > 
> > On Tue, 9 Jan 2018 12:42:34 -0500 Rich Felker <dalias@...c.org> wrote:
> > 
> > > On Wed, Jan 03, 2018 at 02:17:12PM +0100, Jens Gustedt wrote:
> > > > Malloc used a specialized lock implementation in many places. Now
> > > > that we have a generic lock that has the desired properties, we
> > > > should just use this, instead of this multitude of very similar
> > > > lock mechanisms. ---
> > > >  src/malloc/malloc.c | 38 +++++++++++++-------------------------
> > > >  1 file changed, 13 insertions(+), 25 deletions(-)
> > > > 
> > > > diff --git a/src/malloc/malloc.c b/src/malloc/malloc.c
> > > > index 9e05e1d6..6c667a5a 100644
> > > > --- a/src/malloc/malloc.c
> > > > +++ b/src/malloc/malloc.c
> > > > @@ -13,6 +13,8 @@
> > > >  #define inline inline __attribute__((always_inline))
> > > >  #endif
> > > >  
> > > > +#include "__lock.h"
> > > > +  
> > > 
> > > Ah, I see -- maybe you deemed malloc to be the only place where
> > > inlining for the sake of speed made sense? That's probably true.
> > 
> > Yes, and also I was trying to be conservative. Previously, the lock
> > functions for malloc resided in the same TU, so they were probably
> > inlined most of the time.
> 
> Yes, and that was done because (at least at the time) it made a
> significant empirical difference. So I suspect it makes sense to do
> the same still. I've queued your patches 1-3 for inclusion in my next
> push unless I see any major problem. I might try to get the rest
> included too but being that I'm behind on this release cycle we'll
> see..
> 
> Thanks for all your work on this and patience. :)

I'm just coming back to look at this, and I can't get the new lock to
perform comparably well to the current one, much less better, in
malloc. I suspect the benefit of just being able to do a store and
relaxed read on x86 for the unlock is too great to beat. Note that I
just fixed a bug related to this on powerpc64 in commit
12817793301398241b6cb00c740f0d3ca41076e9 and I expect the performance
properties might be reversed on non-x86 archs.

I did have to hack it in since the patch from this series no longer
directly applies, and I just did it inline as a test, but I don't
think I did anything wrong there; it's attached for reference.

I'm also attaching the (very old) malloc_stress.c I used to measure. I
noticed the greatest differences running it with test #3 and 4 threads
(./malloc_stress 3 4), where 4 is the number of cores.

Rich

View attachment "malloc-newlock.diff" of type "text/plain" (783 bytes)

View attachment "malloc_stress.c" of type "text/plain" (3770 bytes)

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.