musl - Re: [PATCH] fix atexit when it is called from an atexit handler

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150723195853.GW1173@brightrain.aerifal.cx>
Date: Thu, 23 Jul 2015 15:58:53 -0400
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCH] fix atexit when it is called from an atexit
 handler

On Thu, Jul 23, 2015 at 09:19:13AM +0200, Jens Gustedt wrote:
> Am Donnerstag, den 23.07.2015, 07:54 +0200 schrieb Jens Gustedt:
> > I'd think this could be done even simpler by running the handlers in
> > batches. ...
> 
> I checked, the strict ordering constraints for the handlers don't
> allow for such a strategy, so forget it.

Indeed, this is not possible.

> Looking at the code, I am a bit uncomfortable with the idea of a
> potentially unlimited number of calloc calls during exit. I think it

Is there a specific reason? This will only happen in pathological code
that keeps registering atexit handlers, and eventually the allocations
will fail and therefore atexit will fail, which it's allowed to do.
The only thing pathological that happens is that memory usage keeps
increasing even if you've balanced the running and registering of
atexit handlers so that the total number registered always stays
bounded, but we're still talking about extreme abuse of the atexit
API, and I think optimizing this case without an argument that it's
worthwhile would probably be premature optimization.

> would be good if we could restrict the maximal number of successful
> calls to atexit during exit to 32. A calloc-free strategy could be to
> save head to a tmp a the beginning of processing and to provide a
> `struct fl` table on the stack of __funcs_on_exit.

I'm not sure how this would be better. It would be more predictable,
but could also probably break some excessive but "valid" use (like
a huge chain of ctors getting called from an atexit handler and all
registering dtors).

> > A similar improvement could be done for at_quick_exit. There I think
> > it makes not much sense accepting new handlers during processing. So
> > the strategy could just be
> > 
> >   (i) Take the lock and save count in a tmp
> >   (ii) set count to 32
> >   (iii) unlock
> >   (iv) do the processing
> > 
> > But looking at it, I think that one can even be more improved. We
> > could just use count as atomic, and so we wouldn't need a lock.
> > I can prepare a patch for that, if you like.
> 
> Since this is fairly simple, I just did it. Please find a new version
> attached. (too much changes, a patch makes no sense.)  This compiles,
> but I have no code to test this.

The current program in musl is not to treat replacements of locks by
an atomics as an improvement, but rather the opposite. Atomics should
not be used unless they're needed for semantic correctness (e.g. in
situations where AS-safety is required and could not be obtained with
locks, or at least not without expensive signal masking if locks were
to be used) or a compelling performance reason, and atexit is surely
not a performance bottleneck. See these recent commits that have
removed or limited the use of atomics:

63c188ec42e76ff768e81f6b65b11c68fc43351e replace atomics with locks in locale-setting code
68630b55c0c7219fe9df70dc28ffbf9efc8021d8 eliminate costly tricks to avoid TLS access for current locale state
6de071a0be00ec2ff08af3c89c7caaa20f1044d7 eliminate atomics in syslog setlogmask function
fd850de7524d8b62cd1d340417b665ed9427adae fix type error (arch-dependent) in new aio code
4e8a3561652ebcda6a126b3162fc545573889dc4 overhaul aio implementation for correctness

The reasoning behind this approach is that direct use of atomics is a
high entry barrier to understanding the code, a major risk of bugs
from subtle errors, and precludes the code using atomics from
benefitting from potential future improvements to synchronization
primitives (like weak memory order, lock elision, etc.).

Certainly the implementations of synchronization primitives
themselves, and some other low-level code (like dynamic TLS setup)
must keep using atomics, and future malloc improvements may benefit
from direct atomic usage, but I'd like to get rid of most/all other
uses.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.