Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Sun, 10 Jun 2012 12:33:59 -0400
From: Rich Felker <dalias@...ifal.cx>
To: musl@...ts.openwall.com
Subject: Re: Re: Vision for new platform

On Sun, Jun 10, 2012 at 11:51:25PM +0800, orc wrote:
> > I don't think you're getting the issue at hand. Suppose you want to be
> > able to automatically bring down a particular daemon -- perhaps to
> > restart it with completely new configuration or to switch to a new
> > version of it. This could happen as part of an automated upgrade
> > process or under manual admin control.
> 
> 'Automated' often becomes the source of problems, if this automated
> subsystem is not engineered properly. If we want daemon that will be
> responsible for other's daemons status and it will start and stop them
> automatically based on the admin's decision than it must be
> well-engineered and tested in many types of situations first.

Without "automated", how do you intend for non-technical users to
upgrade important system components when their old version has a
critical vulnerability? Even if the system has a technically qualified
admin, nobody wants to go manually upgrading/restarting daemons on
tens, hundreds, or thousands of boxes...

I agree automation is a huge source of problems, but I think they're
fundamental problems you can't just pretend don't exist.

> > (even one run by a user as opposed to by root with a
> > separate config file and running on a separate port)
> 
> Killing processes based on uid/gid and cmdline can be achieved with
> pkill already,

No, it cannot. Before you can solve any of these problems you must
understand that you can't use resource handles that belong to another
process which could invalidate them behind your back. This is a core
principle of concurrency programming, and pids are such a resource.

As an aside, I used to really dislike the push towards multi-threaded
programming because concurrency is error-prone and hard to get right.
Then I realized that basically all unix systems programming is
concurrent programming, just disguised to look safe...

> > to killing
> > unrelated processes (by scanning /proc or reading a pid file, then
> > subsequently killing the pid which might not belong to a different
> > process).
> 
> Again, pkill much better than "traditional"
> "kill $(cat /var/run/daemon.pid)" that most of init script use today
> (Am I right?)

No. The pkill approach is the "doing things as stupid as killing any
instance of the daemon" in my text you quoted.

At least with pid files, you know the pid you kill _at one time_
belonged to the daemon you wanted to kill. With pkill, you'll pick up
completely independent instances of the same program binary.

> > If daemons really didn't exit unexpectedly, the only race condition in
> > pid-based approaches to lifetime management would be races between
> > multiple scripted administrative actions (e.g. 2 admins trying to down
> > the daemon at the same time) which could be fixed by locking at the
> > script level.
> 
> Hm, for me that situation sounds a bit strange: even script will exit
> with 'daemon already stopped' or script will send an additional signal

No. It will send a TERM/KILL signal to a new process that happens to
have the same PID as the already-killed daemon. If you get lucky, no
such new process exists, but that's called "getting lucky" which has
no place in robust systems.

> I partially agree with approach that such daemon for monitoring status
> of other daemons should be developed, but I think this daemon should
> control only critical processes for admin, such as:

My view is this:

1. On a hobbyist or fully self-maintained system where you're willing
to manually do all the work of upgrading/restarting things, or on
certain embedded systems where reboot-on-upgrade is acceptable or
where you're sure you won't need security updates (because the system
does not interact with potentially-dangerous inputs), just start all
the daemons from your init script with no management and be done with
it. Components should not be designed in ways that _preclude_ this
ultra-simple setup.

2. On everything else, use your choice of robust daemon management
tool that starts daemons as direct children and therefore can observe
their death and/or intentionally kill them without any race
conditions.

Rich

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.