On October 4, 2021, at 15:39 UTC, the social network Facebook and its subsidiaries, Messenger, Instagram, WhatsApp, Mapillary, and Oculus, became globally unavailable for a period of six to seven hours. The outage also prevented anyone trying to use "Log in with Facebook" from accessing third-party sites.
During the outage, many users flocked to Twitter, Discord, Signal, and Telegram, resulting in disruptions on these apps' servers. The outage was caused by the loss of IP routes to the Facebook Domain Name System (DNS) servers, which were all self-hosted at the time. Border Gateway Protocol (BGP) routing was restored for the affected prefixes at about 21:50, and DNS services began to be available again at 22:05 UTC, with application-layer services gradually restored to Facebook, Instagram, and WhatsApp over the following hour, with service generally restored for users by 22:50.
Security experts identified the problem as a Border Gateway Protocol (BGP) withdrawal of the IP address prefixes in which Facebook's Domain Name servers were hosted, making it impossible for users to resolve Facebook and related domain names, and reach services. Effects were visible globally; for example, Swiss Internet service provider Init7 recorded a massive drop in internet traffic to the Facebook servers after the change in the Border Gateway Protocol.
Cloudflare reported that at 15:39 UTC, Facebook made a significant number of BGP updates, including the withdrawal of routes to the IP prefixes, which included all of their authoritative nameservers. This made Facebook's DNS servers unreachable from the Internet. By 15:50 UTC, Facebook's domains had expired from the caches in all major public resolvers. A little before 21:00 UTC, Facebook resumed announcing BGP updates, with Facebook's domain name becoming resolvable again at 21:05 UTC.
On October 5, Facebook's engineering team posted a blog post explaining the cause of the outage. During maintenance, a command was run to assess the global backbone capacity, and that command accidentally disconnected all of Facebook's data centers. While Facebook's DNS servers ran on a separate network, they were designed to withdraw their BGP routes if they could not connect to Facebook's data centers, making it impossible for the rest of the internet to connect to Facebook.
Facebook gradually returned after a team got access to server computers at the Santa Clara, California, data center and reset them. By about 22:45 UTC, Facebook and related services were generally available again.
The outage cut off Facebook's internal communications, preventing employees from sending or receiving external emails, accessing the corporate directory, and authenticating to some Google Docs and Zoom services. The New York Times reported that employees were unable to access buildings and conference rooms with their security badges. The site Downdetector, which monitors network outages, recorded over 10 million problem reports – the largest number for an incident to date. Steve Gibson, a security researcher, said a "Routine BGP update went wrong" locking out "people with remote access" to the servers to fix the mistake and people with physical access do not have authorization to fix the mistake.
The Google Public DNS service also slowed down as a result of the outage, while users of Gmail, TikTok, and Snapchat also experienced slowdowns. CNBC reported that the outage was the worst experienced by Facebook since 2008. During the day of the outage, shares in the company dropped by nearly 5% and Facebook CEO Mark Zuckerberg's wealth fell by more than $6 billion. According to a report produced by Fortune and Snopes, Facebook lost at least $60 million in advertising revenue.
The outage had a major impact on people in the developing world, who depend on Facebook's "Free Basics" program, affecting communication, business and humanitarian work.
Facebook's Chief Technology Officer Mike Schroepfer wrote an apology after the downtime had extended to several hours, saying, "Teams are working as fast as possible to debug and restore as fast as possible."
U.S. Representative Alexandria Ocasio-Cortez tweeted about the outage, asking people to share "evidence-based" stories on Twitter, making fun of Facebook's reputation for spreading factually questionable content. Twitter and Reddit also posted tweets on their official Twitter accounts commenting on the outage.
Users on both Twitter and Telegram reported a slowdown in response times, believed to be caused by people normally on Facebook services switching to those services.
Some media outlets highlighted the coincidence of Frances Haugen's testimony with that of the outage, although those two events are unrelated to each other.
((cite web)): CS1 maint: url-status (link)