Comment on GitHub Outages Since Microslop Acquisition

merc@sh.itjust.works ⁨1⁩ ⁨week⁩ ago

I’ve worked on services with 5 nines of availability (i.e. 99.999% available, less than 5 minutes of downtime allowed per year). I’ve more frequently worked on ones with 4 nines, where you’re allowed almost an hour of downtime per year. GitHub is now barely maintaining 2 nines. That’s just embarrassing.

Each “nine” you add is much more difficult. To get four nines you need people on call who can start working on a problem within 5 minutes and fix it within a few more minutes, and you can only get those calls once every couple of months. Five nines means that you need people at their desks in shifts ready to start fixing something the moment there’s a problem because it would take too long for someone on-call to get their computer out, connect and authenticate. It requires warm backup systems that are sitting idle but ready to take over fully at a moment’s notice.

A two nines system is allowed to be down for 100x as long as a four nines system, and 1000x as long as a five nines system. It’s almost 15 minutes of downtime allowed per day, compared to about 15 minutes every 3 months for a four-nines system. Gamers wouldn’t even put up with a two-nines system for a video game. It’s absurd to allow that for a critical piece of infrastructure for software.

source
Sort:hotnewtop