Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 06 Dec 2013 23:29:33 +0000
From: Laurent Bercot <ska-dietlibc@...rnet.org>
To: musl@...ts.openwall.com
Subject: Re: [PATCHv2] Add support for leap seconds in zoneinfo files

On 06/12/2013 11:38, Raphael Cohn wrote:

> I'm not sure I've completely followed all the ramifications here.
>  Why should one have to restart processes on 1,000s of nodes
>  for something like this? 5 9s contracts are hard enough to satisfy
>  at the best of times, without adding another 30 - 120 seconds of
>  restart impact, degraded performance and failover for large jobs.
>  In the sorts of sites we operate in, there might be just one admin
>  for the entire estate, and they're busy enough, and probably laden
>  down with more special knowledge than they can recall as it is, to
>  have to deal with something like this as well...

  The recent rate of leap seconds announcements has been about one
every three years. Firmware, and even hardware, evolves faster than that.
In three years, you probably have had to update your production image at
least once. Including new leap second tables into it would basically have
been free.
  And even if you have to do a restart *specifically* for this thing,
if you are dealing with thousands of nodes and need to provide five nines,
then you have good enough automation to reboot your machines one after
the other by snapping your fingers and without making a dent in the
quality of your service.


>  From this point of view, a system update of packages and data files
>  should be accommodated automatically. Isn't that what a file watch is for?
>  Actually, I'd argue the same for any change to any file of configuration
>  data used by a library. There's no way an userspace app is going to know
>  with certainty what files its underlying linked libs are using or how.

  It's a question of balancing complexity. File watches and such can be
used, but:
  - it's hard, it implies more code in the libc, and probably more run-time
resource usage.
  - it's a leaky abstraction: user applications have to avoid caching state,
because that state could change under their feet - which is precisely what
I'm trying to avoid here.
  - this point depends on what configuration data we're talking about, but
as far as leap seconds are concerned, that data is not very dynamic. It's
almost static. Do your daemons really have three years of uptime ? Random
hardware crashes will be more of an annoyance than this.

  Restarting a process is cheap and easy, should be a standard tool in your
toolbox, and your workflow should make use of it when it's needed. You
don't get high availability by forbidding your processes to die: you get
high availability by making sure it's not a problem when they die.


> Date and time match has been got wrong in every system
> (...)
> Personally, I think apps should just use a monotonic source of seconds
>  from an epoch, and use a well-developed third party lib dedicated to
>  the problem if they need date math (eg Joda time in Java).

  I absolutely agree with you on the first part. I disagree on the second
part. Dealing with time shouldn't be a burden on the application - devs
have other things to think about, and experience shows that most of them
won't care, they'll just use the primitives provided by the system. So,
the system should do the right thing, i.e. provide something that works
no matter how applications are using it. Here, it means providing a
linear CLOCK_REALTIME, because people use it as if it were CLOCK_MONOTONIC.

-- 
  Laurent

Powered by blists - more mailing lists

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.