Comment on GitHub Outages Since Microslop Acquisition
p03locke@lemmy.dbzer0.com 1 week agoFive nines means that you need people at their desks in shifts ready to start fixing something the moment there’s a problem
No, it means you don’t have outages. Ever.
Five-nines is something like 7 minutes of downtime throughout the entire year. At best, you might have automated failover systems that require tiny outages. No human involving, though, unless you’re deal with some major breakage that would have killed the five-nines commitment that year, anyway.
It’s takes a human something like 5-10 minutes just to get out of bed and figure out the situation, anyway.
merc@sh.itjust.works 1 week ago
No, that’s infinite nines, which isn’t possible.
Yes, you have automated failover systems. But, if something happens which causes those systems to fail over, you need to immediately investigate what happened and why. Even at four nines you have automatic failover, redundant system, hot spares, etc. But, you accept that sometimes not everything will work as planned and you’ll need to fix something. Five nines is just that and more.
Right, which is why I said that four nines is your realistic maximum if you’re going to have people on call who aren’t actually at their desks. To get better than four nines you need to have around the clock coverage with people at their desks so when a system breaks you have eyes on it in something like 30s.
p03locke@lemmy.dbzer0.com 1 week ago
It’s not impossible. Large reliable websites do it all the time. It’s call 100% uptime.
Sure, it’s measured per year, and sometimes they have some outage that breaks the record. But, it is possible to have 100% uptime throughout the year.
merc@sh.itjust.works 1 week ago
No, no website does it. There is no such thing as 100% uptime. If it happens, great, but I can guarantee you that no website even aims for 5 nines of uptime.
Google is the benchmark for website availability and in 2022 they had an outage that lasted an hour, meaning they didn’t meet 4 nines for the year.
If you miss your SLO target for the year, then you missed your SLO target. If you’re down for 60 minutes but fine for the other 11 months, 29 days and 23 hours, you still missed your yearly SLO.
p03locke@lemmy.dbzer0.com 1 week ago
In 2022. In the other years, they had 100% uptime.
Also, yes, there are plenty of clients that ask for five-nines. Is it realistic? Probably not. But, they definitely ask.
I understand how SLO targets work. If somebody is asking for a five-nines as an SLO, they are basically asking for 100% uptime, because there is no such thing as a “five minute outage”, especially not one that is fixable without total automation.
Again, a human hasn’t even gotten paged and out of bed in 5 minutes time.