Does Uptime still matter?
When I started working as a sysadmin (about 10 years ago) there was this obsession about uptime. Everyone considered this the greatest sign that you are doing a good job as a sysadmin if you were able ‘to keep the machine running’ for a long time. Looking back, I believe this was mainly because there were not so many systems in place at that time, and everything was in the early days: we were running linux kernel 2.x, we had some ‘fancy’ pentiums as super servers, and were doing fancy bgp exchanges with cisco 3600 routers, and most of our clients were using dial-up lines to connect. Uff… those were fun times
Anyway, we didn’t had failover systems implemented, nor did we had fancy monitoring and reporting on all possible things, like we started to implement as business was starting to depend on those systems more and more. During that time, any sysadmin I knew would show how good he was based on the uptime he was able to run one of his ‘core’ servers. When we had to reboot for something (hardware upgrade, or failure, etc.) this was a tragedy as we were losing ‘the uptime’.
Now, after all this time I realized that I don’t care about this at all. I am working with completely different systems and they are mostly redundant, where most of the times taking down a system means to schedule downtime in nagios so it doesn’t trigger the alerts, but this will not affect the system in general as failover will take place immediately. I am no longer looking at the uptime of one machine, but on the uptime and reliability of the system the machine belongs. This is why the moment when I was doing a consulting job for a client looking at his server for the first time (he had only 3 machines independent on each other), I sow this (this is just a copy/paste):
srv01:~# uptime
20:38:31 up 1119 days, 47 min, 1 user, load average: 0.76, 0.59, 0.68
this means a little over 3 years (like February 2006)… Wow… I said to myself: the old sysadmin didn’t care about kernel updates (@2.6.15). I said… hmm… maybe he was doing application upgrades at least; looking at mysql (@4.1.15 btw) this was up for 1043 days and 11:52 hrs. Should I be impressed? or disappointed about a poor job and lack of interest in system maintenance and upgrades from the previous admin? I was disappointed of course…
I just realized that I was looking into this situation from my recent works and experiences. But then I asked myself “does uptime matter or not?”. My answer today would be that of course it does matter, regardless if we are looking at one individual system or a bigger setup that is fully redundant. Still, I would never sacrifice security and application updates because of this. We still need to have maintenance windows where we can keep the systems updated, secured and in good shape, even if this means rebooting them from time to time to fix whatever kernel bugs.
I am interested to hear your opinion… What do you think? does uptime still matter?
>

7th April 2009, 15:04
I think uptime still matters, just not in the way it used to. I think of it more as how long have the services the machine provides been up and has it been stable? Not counting the “uptime” as the number of days that have gone by since I’ve rebooted because of a kernel update.
7th April 2009, 16:01
Great article. I recently retired a couple of DNS servers that were running FreeBSD and terribly behind in patches. Their uptime was around 1200 days too. I think you’re spot on when you mention that uptime for a system matters a lot more than uptime for servers. We run Microsoft IIS servers behind load balancers. Recently, I had a chance to see the percentage of time during which the servers passed their health checks (http check) and noticed each server was at least 99.999% up. I then started to think of how the server uptime didn’t matter half as much as the load balancer’s uptime, which in this case was 100%. We could put all the servers we wanted out there but their reliability is almost pointless if the device in front of them isn’t equally or more reliable.
7th April 2009, 18:56
I think there’s a balancing act between uptime and updating. As with any tech/geek/nerd domain, there are fanatics that ignore the balance and choose one over the other at all costs.
7th April 2009, 18:56
it all depends on the situation, if the mysql server was an internal server doing fine perhaps not upgrading was a good way to keep the application running on it stable.
Upgrades sometimes break a system or can have a lot of impact on the workings of an application, I know several instances where we upgraded systems and they just stopped working (yes, read the upgrade documentation), most notorious was the firmware upgrade in our san. It was a terrible experience.
7th April 2009, 19:10
With virtual servers which are live-migrated to other hardware nodes so the original hardware node may be upgraded and rebooted, then the virtual server migrated back to the upgraded hardware node (live of course). So you can have unlimited uptime. The virtual server never stops running.
7th April 2009, 19:22
I’d agree, that the uptime of individual devices or servers is largely irrelevant, now that clustering, load balancing and live-migrated vm’s are widely used. The availability of a system is what matters, not the uptime of a component.
@Stephan – the virtual server stills need patching/maintenance periodically (security). So it can’t run forever unless you ignore security. I expect the uptime of any individual device or server to be no more than 1.5 to 2x the vendor patch cycle (45-60 days for Windows, 120-180 days for Oracle). Any longer than that is a potential problem, as patches are not current.
7th April 2009, 19:45
@Michael: Yes, if uptime of services is counted. I mean only uptime of the system. Our virtual servers run with the kernel of the hardware node, so a kernel upgrade of the hardware node needs no restart of the virtual server. Of course, restarting upgraded services like MySQL or Apache on virtual servers makes no difference to hardware servers.
I understand, that this is not possible with some virtualization technology (like Xen or vmWare). But the solution that we use (OpenVZ) does not need a reboot of the virtual server after a kernel upgrade.
8th April 2009, 07:34
yeah uptime sometime scares me too.
one of our servers has 1205 days uptime, running mysql too (no cluster, no failover). what nobody knows is what hapens when the machine reboots. fortunately we already have plan for reboot with secondary machine as backup.
but i don’t thing that the uptime is important in most situations…. instead counting availability is far more important. and clustering/loadbalancing is not the sollution always … high availability does not mean high continuous availability … switching users betwen nodes could be noticable for the user if the applications does not share all state informations like sessions etc.
8th April 2009, 08:22
[...] confieso que de cierta forma da gusto ver que X servicio lleva muchísimos días corriendo. ¿Pero realmente ha alguien le importa que no hayas reiniciado el servidor en tanto tiempo? Esta entrada fue escrita por Javier Aroche, publicada en 8 de Abril de 2009 a las 2:22 am, [...]
8th April 2009, 09:13
100% agree with you Marius, what really matters is services uptime, not systems uptime. And as you say, a “low” uptime is a good thing to have cause it means you’re actually updating your system (which is as well more important than raw uptime)
8th April 2009, 15:29
It doesn’t matter and you’ve identified precisely why – most shops can now down machines without downing their services. High availability is reasonably common place so machine uptime has lost it’s meaning.
Although, like most people, I’ve been guilty of having an outdated legacy machine on my books before. It carried an uptime in years that nobody dared touched. As luck would have it it worked in our favour when Linux vmsplice() exploits did the rounds because the kernel was old enough not to be affected by the bug. Thankfully it’s gone now.
9th April 2009, 07:01
[...] “Should I be impressed? or disappointed about a poor job and lack of interest in system maintenance and upgrades from the previous admin? I was disappointed of course…” http://www.ducea.com/2009/04/07/do… [...]
15th April 2009, 12:33
[...] of MDLOG:/sysadmin, the Journal of a Linux Sysadmin, recently pondered the way that the uptime/downtime calculation has changed over the past ten [...]
18th May 2009, 13:55
[...] ¿Importó alguna vez el -Uptime- ? Como syadmin, confieso que de cierta forma da gusto ver que X servicio lleva muchísimos días corriendo. ¿Pero realmente ha alguien le importa que no hayas reiniciado el servidor en tanto tiempo? [...]