You have commented 339 times on Rantburg.

Your Name
Your e-mail (optional)
Website (optional)
My Original Nic        Pic-a-Nic        Sorry. Comments have been closed on this article.
Bold Italic Underline Strike Bullet Blockquote Small Big Link Squish Foto Photo
Science & Technology
Facebook outage caused by a single mistake; has huge implications
2021-10-06
[9to5mac] Yesterday’s Facebook outage – which took down Facebook Messenger, Instagram, and WhatsApp as well as the main service – resulted from a mistake by the company’s own network engineers.

The mistake led to all of Facebook’s services being inaccessible, with one analogy likening it to a failure in the “air traffic control” services for network traffic …

We reported yesterday on the massive failure.

It’s not just you: Facebook, Instagram, and WhatsApp are all currently down for users around the world. We’re seeing error messages on all three services across iOS applications as well as on the web. Users are being greeted with error messages such as: “Sorry, something went wrong,” “5xx Server Error,” and more.

The outage is affecting every Facebook-owned platform, according to data on Downdetector and Twitter. This includes Instagram, Facebook, WhatsApp, and Facebook Messenger […] While some Facebook, Instagram, and WhatsApp outages only affect certain geographic regions, the services are down worldwide today.

It gradually appeared that the problem might relate to DNS – the domain name servers that tell devices which IP addresses to use to access services – but it was unclear what exactly had happened, and whether this was an external hack, malicious action by an insider, or a catastrophic mistake.

Facebook has now admitted in a blog post that it was a mistake.

Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication. This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt.

It took a long time to resolve the problem because the inaccessible systems included the servers and tools engineers would normally use to solve the problem remotely. Reports suggest that lower-level employees had to gain physical access to the data centers, and then rely on step-by-step instructions from more senior engineers in order to undo the mistake. Complicating this, the networks being unavailable meant that Facebook’s door access systems were also offline, physically preventing access.
Read the rest at the link
The Times of Israel adds:
After an almost unprecedented six-hour global outage, Facebook restored its services and those of WhatsApp and Instagram on Monday and blamed the fiasco on configuration changes it made to the routers that coordinate network traffic between its data centers.

“This disruption to network traffic had a cascading effect on the way our data centers communicate, bringing our services to a halt,” Facebook vice president of infrastructure Santosh Janardhan said in a post.
Posted by:badanov

#6  Always wanted to replace DNS with GPS, altitude, a random # and provider network.
Posted by: 3dc   2021-10-06 20:29  

#5  
the networks being unavailable meant that Facebook’s door access systems were also offline, physically preventing access.

A truly dumbass network "feature"
Posted by: Bubba Lover of the Faeries8843   2021-10-06 14:16  

#4  Facebook slammed for promoting 1619 Project content: 'Utterly irresponsible'
Posted by: Skidmark   2021-10-06 07:49  

#3  Mistakes were made....billions lost by the boss...move along nothing happening.
Posted by: Joluling Gleque7445   2021-10-06 06:08  

#2  A 'mistake' - that's what they're going with now?
Posted by: Raj   2021-10-06 01:45  

#1  Yeah, DNS is the glue that holds everything together. But nobody pays any attention to it. People let domain names lapse all the time. "Who knew we had to renew it!"
Posted by: Blinky Pholuling8616   2021-10-06 00:30  

00:00