Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Thu, 10 Mar 2016 11:57:54 +0100
From: Ingo Molnar <mingo@...nel.org>
To: Andy Lutomirski <luto@...capital.net>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	Andy Lutomirski <luto@...nel.org>,
	the arch/x86 maintainers <x86@...nel.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
	Borislav Petkov <bp@...en8.de>,
	"musl@...ts.openwall.com" <musl@...ts.openwall.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <a.p.zijlstra@...llo.nl>
Subject: Re: Re: [RFC PATCH] x86/vdso/32: Add AT_SYSINFO cancellation
 helpers


* Andy Lutomirski <luto@...capital.net> wrote:

> Let me try to summarize my understanding of the semantics.
> 
> Thread A sends thread B a signal.  Thread B wants to ignore the signal and defer 
> handling unless it's either in a particular syscall and returns -EINTR or unless 
> the thread is about to do the syscall.

s/the syscall/an interruptible syscall/

The fundamental intention is to essentially allow the asynchronous killing 
(cancellation) of pthread threads without corrupting user-space data structures 
such as malloc() state.

There's a long list of system calls listed at pthread(8) that must be cancellation 
points, plus an even longer list of system calls and libc APIs that may be 
cancellation points.

On glibc signal 32 (the first RT signal) is used as the cancellation signal.

But I guess you knew all this already!

So my original thinking was this:

  | What surprises me is why Musl even bothers with trying to detect system calls 
  | that are about to be executed. Cancellation is a fundamentally polling-type 
  | API, a very small, 2-3 instructions window to 'miss' the current system call 
  | has no practical latency effect - so why does it even attempt to detect that 
  | RIP range? Why doesn't Musl just check the cancellation flag (activated by 
  | signal 32) and is content? Am I misunderstanding something about it?

... and when I wrote that up I realized the detail that I missed: it's a 
problematic race if the thread starts a long-lived blocking system call (such as 
accept()), shortly after the cancellation signal has been sent.

So the signal-32 handler _has_ to check the RIP and make sure that the system call 
is not about to be executed - cancellation might be delayed indefinitely 
otherwise. It's essentially needed for correctness.

Linus's suggestion to allow system calls to be more interruptible via a new SA_ 
flag also makes sense, but that is a latency improvement change - while the aspect 
I was wondering about was a fundamental correctness detail.

So I withdraw my objection regarding AT_SYSINFO cancellation helpers. User-space 
needs to have a signal-atomic way to prevent system calls from being started after 
a cancellation signal has been received.

Thanks,

	Ingo

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.