GitHub Outages Since Microslop Acquisition

⁨688⁩ ⁨likes⁩

Submitted ⁨⁨2⁩ ⁨months⁩ ago⁩ by ⁨0x0@lemmy.zip⁩ to ⁨technology@lemmy.zip⁩

https://lemmy.zip/pictrs/image/30b98e80-7e91-4d12-b1b7-f6223d364626.avif

source

Comments

Sort:hotnew top

9point6@lemmy.world ⁨2⁩ ⁨months⁩ ago
Image

source
- Damage@feddit.it ⁨2⁩ ⁨months⁩ ago
  I see one nine
  
  source
  - huquad@lemmy.ml ⁨2⁩ ⁨months⁩ ago
    Microsoft never promised where the nines would be
    
    source
    -> View More Comments
  - caseyweederman@lemmy.ca ⁨2⁩ ⁨months⁩ ago
    I see six
    
    source
- AnUnusualRelic@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Lies! 89.98% has two nines in it!
  
  source
- raspberriesareyummy@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Thank you, that is much more helpful than OP graph
  
  source
frank@sopuli.xyz ⁨2⁩ ⁨months⁩ ago
Move slow and break shit

source
- InvalidName2@lemmy.zip ⁨2⁩ ⁨months⁩ ago
  It’s the best of both worsts.
  
  source
DahGangalang@infosec.pub ⁨2⁩ ⁨months⁩ ago
Obv a gross looking chart, but I am bothered that the left hand scale is trimmed off. I expect those are 10% increments, but wouldn’t be shocked if Original was like 99.0, 98.0, 97.0, etc.

source
- raspberriesareyummy@lemmy.world ⁨2⁩ ⁨months⁩ ago
  Thank you! I was thinking “it can’t just be me that’s bothered”
  
  source
merc@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
I’ve worked on services with 5 nines of availability (i.e. 99.999% available, less than 5 minutes of downtime allowed per year). I’ve more frequently worked on ones with 4 nines, where you’re allowed almost an hour of downtime per year. GitHub is now barely maintaining 2 nines. That’s just embarrassing.

Each “nine” you add is much more difficult. To get four nines you need people on call who can start working on a problem within 5 minutes and fix it within a few more minutes, and you can only get those calls once every couple of months. Five nines means that you need people at their desks in shifts ready to start fixing something the moment there’s a problem because it would take too long for someone on-call to get their computer out, connect and authenticate. It requires warm backup systems that are sitting idle but ready to take over fully at a moment’s notice.

A two nines system is allowed to be down for 100x as long as a four nines system, and 1000x as long as a five nines system. It’s almost 15 minutes of downtime allowed per day, compared to about 15 minutes every 3 months for a four-nines system. Gamers wouldn’t even put up with a two-nines system for a video game. It’s absurd to allow that for a critical piece of infrastructure for software.

source
- p03locke@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
  
  Five nines means that you need people at their desks in shifts ready to start fixing something the moment there’s a problem
  
  No, it means you don’t have outages. Ever.
  
  Five-nines is something like 7 minutes of downtime throughout the entire year. At best, you might have automated failover systems that require tiny outages. No human involving, though, unless you’re deal with some major breakage that would have killed the five-nines commitment that year, anyway.
  
  It’s takes a human something like 5-10 minutes just to get out of bed and figure out the situation, anyway.
  
  source
  - merc@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
    
    No, it means you don’t have outages. Ever.
    
    No, that’s infinite nines, which isn’t possible.
    
    Five-nines is something like 7 minutes of downtime throughout the entire year. At best, you might have automated failover systems that require tiny outages. No human involvement, though, unless you’re deal with some major breakage that would have killed the five-nines commitment that year, anyway.
    
    Yes, you have automated failover systems. But, if something happens which causes those systems to fail over, you need to immediately investigate what happened and why. Even at four nines you have automatic failover, redundant system, hot spares, etc. But, you accept that sometimes not everything will work as planned and you’ll need to fix something. Five nines is just that and more.
    
    It’s takes a human something like 5-10 minutes just to get out of bed and figure out the situation, anyway.
    
    Right, which is why I said that four nines is your realistic maximum if you’re going to have people on call who aren’t actually at their desks. To get better than four nines you need to have around the clock coverage with people at their desks so when a system breaks you have eyes on it in something like 30s.
    
    source
    -> View More Comments
- Waraugh@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
  I’m used to environments where they expect five nines, get 3-4 nines, and fund for 1 nine.
  
  source
- HrabiaVulpes@europe.pub ⁨2⁩ ⁨months⁩ ago
  I cal bullshit on “Gamers wouldn’t put up with a two-nines system for a video game”
  
  Elder Scrolls Online has a weekly scheduled outage for about 8h. Every monday. Players have been complaining about it for years, but game is still popular.
  
  source
  - merc@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
    How often is it offline outside the maintenance windows?
    
    Yeah, maintenance windows are annoying, but they don’t really count when describing the availability of a system. Many government systems are only available during normal business hours. That means they’re offline for 16 hours per day. What matters is how available they are when they’re supposed to be working.
    
    For Elder Scrolls, two nines would mean that the game was allowed to be down for more than an hour a week outside of those maintenance windows. Or, if measured by quarters, which is more typical, the game would still have those maintenance windows, but, in addition, it might be down for a full day once per quarter.
    
    Basically, the 8 hour windows every Monday is a trade-off so that the rest of the week is uninterrupted. They probably manage three nines the rest of the week by shifting any serious maintenance into the weekly downtime.
    
    And, as for the game being “still popular”, one site says that there are currently 7199 players in Elder Scrolls but more than 161k in World of Warcraft. It could be that part of the reason that World of Warcraft is more popular is that it doesn’t have 8 hour maintenance windows every week, but it does often have 2+ hour windows. The number of players who are willing to put up with 8 hour maintenance windows every week seems pretty small.
    
    source
raspberriesareyummy@lemmy.world ⁨2⁩ ⁨months⁩ ago
Nothing to make a point like snipping off the y-axis scaling.

I hate Microslop like any person with > 2 brain cells, but that graph is useless - all visible y-entries end in a 0 - might as well be 99.990, 99.980, 99.970, …

source
- Jordan117@lemmy.world ⁨2⁩ ⁨months⁩ ago
  It’s just Xitter’s image viewer cropping it automatically; the original upload has it.
  
  source
  - prenatal_confusion@feddit.org ⁨2⁩ ⁨months⁩ ago
    It is still bad practice to select a narrow window from a axis like this and show the difference that seems massive relative to what is shown but isn’t that significant when we can see the relation to the whole.
    
    Graph 101
    
    source
    -> View More Comments
k0e3@lemmy.ca ⁨2⁩ ⁨months⁩ ago
Surely they could just Copilot their way out of this mess lmao

source
- tja@sh.itjust.works ⁨2⁩ ⁨months⁩ ago
  They are trying ^^
  
  source
JordanZ@lemmy.world ⁨2⁩ ⁨months⁩ ago
[deleted]
source
- BleatingZombie@lemmy.world ⁨2⁩ ⁨months⁩ ago
  I think you’re right, which is funny because now I dont trust Azure either
  
  source
paris@lemmy.blahaj.zone ⁨2⁩ ⁨months⁩ ago
damrnelson.github.io/github-historical-uptime/

A lot of this is GitHub Actions alone, but a lot of it isn’t. I also don’t know how well GitHub tracked outages before the Microsoft acquisition. It’s entirely possible the graph looks so bad because they only took outage tracking seriously after being acquired. I don’t know.

Further related discussion on Hacker News: news.ycombinator.com/item?id=47925317

source
possiblylinux127@lemmy.zip ⁨2⁩ ⁨months⁩ ago
It is impressive how bad Microsoft is fumbling the bag

source
- 0x0@lemmy.zip ⁨2⁩ ⁨months⁩ ago
  Thank Bog there are Codeberg and forĝejo.
  
  source
Safeguard@beehaw.org ⁨2⁩ ⁨months⁩ ago
Is that real? Because that… Makes it real clear…

source
MBech@feddit.dk ⁨2⁩ ⁨months⁩ ago
How does this corrospond with growth? I imagine having 100% uptime is much harder the bigger a platform is, so did Github grow a lot in the same period?

I’m not questioning wether or not Microsft has issues, I just find it relevant wether or not they very suddenly saw a 2000% increase in server usage or something.

source
- jatone@lemmy.dbzer0.com ⁨2⁩ ⁨months⁩ ago
  I imagine having 100% uptime is much harder the bigger a platform is, so did Github grow a lot in the same period?
  
  its not there are scale points where once you hit a critical number you need to re-architect your backend. 1k,10k,1mil, etc. usually these vary based on your app. but they’re usually exponential so once you hit the higher levels it takes much longer to reach the next level.
  
  on top of that you usually by the higher tiers have proper backpressure and signals being sent to the frontend systems to dynamically manage the load generated. so suddenly uptime is much easier.
  
  when you see large repeated failures like this the cause is almost always corporate causing issues.
  
  reducing engineering budget.
  
  not listening to engineering department on product decisions. (see the recent product manager AI generated commit that got merged and caused a mild uproar of 'co authored by copilot)
  
  rushing nonsense out before its ready.
  
  source
bagsy@lemmy.world ⁨2⁩ ⁨months⁩ ago
But the payment processing service has 9 nines of uptime…

source
SocialistVibes01@lemmy.ml ⁨2⁩ ⁨months⁩ ago
How many of those outages were due to AI training?

source
ServantOfRa@lemmy.blahaj.zone ⁨2⁩ ⁨months⁩ ago
Remember when mSlop bought HotMail? Same shit, different decade.

source
bitjunkie@lemmy.world ⁨2⁩ ⁨months⁩ ago
That’s just fucking disgraceful.

source
- possiblylinux127@lemmy.zip ⁨2⁩ ⁨months⁩ ago
  You should see what they are doing to Minecraft
  
  source
  - bitjunkie@lemmy.world ⁨2⁩ ⁨months⁩ ago
    Unfortunately I have, my kid is absolutely fucking obsessed with it
    
    source
    -> View More Comments