|
Message-ID: <52A25DDD.70400@skarnet.org> Date: Fri, 06 Dec 2013 23:29:33 +0000 From: Laurent Bercot <ska-dietlibc@...rnet.org> To: musl@...ts.openwall.com Subject: Re: [PATCHv2] Add support for leap seconds in zoneinfo files On 06/12/2013 11:38, Raphael Cohn wrote: > I'm not sure I've completely followed all the ramifications here. > Why should one have to restart processes on 1,000s of nodes > for something like this? 5 9s contracts are hard enough to satisfy > at the best of times, without adding another 30 - 120 seconds of > restart impact, degraded performance and failover for large jobs. > In the sorts of sites we operate in, there might be just one admin > for the entire estate, and they're busy enough, and probably laden > down with more special knowledge than they can recall as it is, to > have to deal with something like this as well... The recent rate of leap seconds announcements has been about one every three years. Firmware, and even hardware, evolves faster than that. In three years, you probably have had to update your production image at least once. Including new leap second tables into it would basically have been free. And even if you have to do a restart *specifically* for this thing, if you are dealing with thousands of nodes and need to provide five nines, then you have good enough automation to reboot your machines one after the other by snapping your fingers and without making a dent in the quality of your service. > From this point of view, a system update of packages and data files > should be accommodated automatically. Isn't that what a file watch is for? > Actually, I'd argue the same for any change to any file of configuration > data used by a library. There's no way an userspace app is going to know > with certainty what files its underlying linked libs are using or how. It's a question of balancing complexity. File watches and such can be used, but: - it's hard, it implies more code in the libc, and probably more run-time resource usage. - it's a leaky abstraction: user applications have to avoid caching state, because that state could change under their feet - which is precisely what I'm trying to avoid here. - this point depends on what configuration data we're talking about, but as far as leap seconds are concerned, that data is not very dynamic. It's almost static. Do your daemons really have three years of uptime ? Random hardware crashes will be more of an annoyance than this. Restarting a process is cheap and easy, should be a standard tool in your toolbox, and your workflow should make use of it when it's needed. You don't get high availability by forbidding your processes to die: you get high availability by making sure it's not a problem when they die. > Date and time match has been got wrong in every system > (...) > Personally, I think apps should just use a monotonic source of seconds > from an epoch, and use a well-developed third party lib dedicated to > the problem if they need date math (eg Joda time in Java). I absolutely agree with you on the first part. I disagree on the second part. Dealing with time shouldn't be a burden on the application - devs have other things to think about, and experience shows that most of them won't care, they'll just use the primitives provided by the system. So, the system should do the right thing, i.e. provide something that works no matter how applications are using it. Here, it means providing a linear CLOCK_REALTIME, because people use it as if it were CLOCK_MONOTONIC. -- Laurent
Powered by blists - more mailing lists
Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.