Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 21 Feb 2015 22:24:53 -0500
From: Rich Felker <dalias@...c.org>
To: musl@...ts.openwall.com
Subject: Masked cancellation mode draft

Masked cancellation mode -- PTHREAD_CANCEL_MASKED

Background

POSIX thread cancellation provides an exception-like model to letting
threads process cancellation requests and clean up their state in
preparation to exit. Unfortunately, this model is completely foreign
to the C language and requires anti-idiomatic techniques like stuffing
all local state into a structure to make it available to cleanup
routines. It also makes it impossible to construct cancellable
primitives that, upon receiving the cancellation request, need to back
out the operation in progress without actually acting on cancellation,
because the caller needs to see their return. An example is
pthread_cond_wait when it discovers after receiving the cancellation
request that it already consumed a signal; in this case it must return
and leave the cancellation request pending. While, as part of the
implementation, pthread_cond_wait can use various hacks to satisfy
this requirement, the current standard for cancellation leaves
applications with no way to construct custom primitives with the same
property.

In addition, POSIX thread cancellation in its current state is
incompatible with third-party library code which was not specifically
written to be cancelable. If a thread acts on a cancellation request
during a call to library code which was not written to be
cancellation-aware, any data the library code was operating on may be
left in an inconsistent state. Locks may be left locked, and
resources, including file descriptors and allocated memory, may leak
or may have dangling references left behind after they are freed.
Thus, a thread calling such library code must either ensure that it is
never the target of cancellation requests or that it blocks
cancellation during library calls. This of course defeats one of the
most important usage cases for cancellation: stopping an asynchronous
query operation (network connection, database query, etc.) whose
results are no longer needed and which is stuck in a blocking
operation.


Adapting Cancellation to Idiomatic C

Well-written C functions check the return value of any function call
which can fail and properly back out partially completed work and
return their failure status to their caller. The new MASKED mode
allows this existing idiomatic error handling pattern to process
cancellation requests.

When the cancellation state is set to MASKED, the first cancellation
point (other than close, which is special) called with cancellation
pending, or which has a cancellation request arrive while it's
blocking, returns with an error of ECANCELED, and sets the
cancellation state to DISABLE.

Even code which was not specifically written to be cancellation-aware
is compatible with this behavior. As long as it is responding to
errors, it will see the error, but will have the full repertoire of
standard functions available to use while cleaning up and returning
after the error. If the error is ignored, cancellation will be
delayed, but the behavior is no worse than what could already happen
from ignoring errors.


Design Choices

One-off or sticky failure: One obvious question when returning an
error to report cancellation is whether only the first cancellation
point, or all calls to cancellation points, should fail with errors.
The one-off approach was chosen mainly because it's the most
compatible with existing library code, which may need to call other
functions which are cancellation points in its error paths. 

Exempting close: While close is a cancellation point, it's rare for
applications to check for errors from close, and when they do check
they often mishandle it. But more importantly, POSIX (with pending
Austin Group interpretations applied) requires that the fd be released
when close fails with an error other than EINTR, and also requires
that close not release the fd when acting on cancellation. These
requirements are mutually contradictory if close is to return an error
of ECANCELED, and are best resolved by simply suppressing close's
status as a cancellation point in MASKED cancellation mode.

Choice of error code: ECANCELED was chosen because it semantically
matches cancellation and because it was not otherwise used as a
standard error code for any interfaces which are cancellation points.
EINTR was also a good candidate since side effects on cancellation are
specified to match side effects on EINTR, but using EINTR would
prevent applications from differentiating interruption by a signal
from cancellation and would thereby violate the POSIX requirement that
implementation-defined error conditions not alias standardized errors.

Consuming cancellation request vs disabling: There are two potential
ways to achieve one-off failure. One is clearing the pending
cancellation request when reporting the error. The other is setting
the state to DISABLE. While the ability to clear pending cancellation
requests would be highly desirable in itself, it potentially increases
the implementation burden (including the complexity of synchronizating
such consumption/clearing with threads sending cancellation requests)
and yields worse default behavior: code wanting to leave the
cancellation request pending when restoring the default cancellation
state would have to re-raise it via pthread_cancel(pthread_self()).

State vs type: PTHREAD_CANCEL_MASKED is defined as a new cancellation
state rather than a type. This is for two main reasons:

1. The existing types represent times at which the implementation is
   permitted to act on cancellation, while the existing states
   represent whether acting on cancellation is permitted at all. In
   the new MASKED mode, cancellation is never acted upon. Its pending
   status or arrival is merely made available to the application via
   new error conditions in functions which are cancellation points.

2. The intended usage is simpler with a state than with a type. Since
   the first cancellation point to report failure switches the state
   to DISABLE, the caller would need to save and restore both state
   and type if MASKED were a type. By being a state, the cost of
   saving and restoring the mode is minimized.

Graceful fallback: By defining a new state macro rather than
completely new interfaces, applications can gracefully fallback to
disabling cancellation on implementations which lack MASKED
cancellation state with the following:

    #ifndef PTHREAD_CANCEL_MASKED
    #define PTHREAD_CANCEL_MASKED PTHREAD_CANCEL_DISABLE
    #endif

No other changes are needed. Any error-checking code that treats
ECANCELED as special will simply be a dead code path since it will not
be seen on such implementations.


Implementation

In signal-based implementations of cancellation, the desired behavior
is easily achieved simply by having the signal handler replace the
saved program counter in its ucontext_t, which necessarily contains an
address in critical range between the pre-syscall cancellation check
and the syscall instruction, with the address of code that returns an
ECANCELED error and resets the cancellation state to DISABLE.



Stability and Status

Presently all of the above is an experimental interface in musl libc
that should not be used in production code (outside of libc itself).
Details of the behavior and/or public interfaces may change based on
feedback and experience gained from use in musl and experimental use
by users.

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.