musl - Re: Re:Re: [timer] timer_delete function async problem

Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <20200217151527.GV1663@brightrain.aerifal.cx>
Date: Mon, 17 Feb 2020 10:15:27 -0500
From: Rich Felker <dalias@...c.org>
To: zuotina <zuotingyang@....com>
Cc: musl@...ts.openwall.com
Subject: Re: Re:Re: [timer] timer_delete function async problem

On Mon, Feb 17, 2020 at 03:12:03PM +0800, zuotina wrote:
> The pseudo-code example program as follows. There are two processes A and B.
> process A:
> void timer_handler(union sigval arg)
> {
> pctx = (struct ctx *)arg.sival_ptr;
> if (pctx == NULL)
> return;
> // Here may be scheduled to timer_release of thread B.
> // When scheduled again, the pctx was illeagal， so panic.
> lock(pctx->lock);
> /* do something with pctx */
> unlock(pctx->lock);
> }
> void create_timer_function()
> {
> struct sigevent sig;
> sig.sigev_notify = SIGEV_THREAD;
> sig.sigev_value.sival_ptr = pctx; 
> sig.sigev_notify_function = timer_handler;
> timer_create(CLOCK_REALTIME, &sig, &pctx->timerid);
> }
> 
> 
> process B:
> void timer_release()
> {
> lock(pctx->lock);
> timer_delete(pctx->timerid);
> unlock(pctx->lock);
> // here will do something...
> free(pctx);
> }
> 
> 
> After the call to timer_delete is made but before the signal is sent, 
> a new handler which is not already-running will be coming at uncertain time.
> I've tried synchronization and mutual between 'timer_release' and 'timer_handler', 
> neither can be resolved. 
> How to solve the panic in the example, hope to give some suggestions.

As written, the code has use-after-free, independent of what
timer_delete does. It's always possible that the timer thread is at
the entry point of timer_handler when thread B calls timer_delete.
Then thread B frees an object which the timer thread is accessing.
Nothing timer_delete could do could prevent this situation from
arising.

At first glance, it seems like this is a fundamental design flaw in
the SIGEV_THREAD timer API -- there's no way to determine whether
there's still (at least) one thread that will run, since the state of
having started but not executed any application code in the handler
yet is not observable/distinguishable from not having run at all. If
this analysis is correct, then it seems like the timer handler thread
must use at least one object of permanent lifetime (i.e. that's never
freed) to synchronize and determine if it can access other objects.

If you can assume (as is true in the musl implementation, but doesn't
seem to be guaranteed) that at most one timer handler thread for a
given timer is runnable at once, then you can simply have the timer
handler thread be responsible for deleting the timer and freeing the
associated data object based on a flag that another thread set inside
the object (under appropriate locking).

If you can't assume this, you can make it true by not using
auto-rearming timers (i.e. no it_interval) and having the timer thread
re-arm itself (it_value) every time it runs.

None of this is nice, and suggests that SIGEV_THREAD timers (and the
POSIX timers API in general) is poorly designed and difficult to use
safely. Really, the right solution here is just making your own thread
that calls clock_nanosleep (probably with TIMER_ABSTIME) in a loop and
runs the code you want each time the sleep finishes. This is far
easier to get right and doesn't have complex lifetime issues to deal
with.

Rich
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.