12/9/2016 Provo Data Center Outage
On Friday, December 9th at approximately 4:00 pm (ET) we experienced a network outage at our Provo Data Center located in Utah. This resulted in service disruption for Virtual Private Server (VPS) and Dedicated customers as well as some of our shared customers served out of that facility.
With any significant event that affects our customers, we conduct an extensive examination to understand the root cause and develop a course of action to improve our systems and procedures. To that end, we wanted to provide a synopsis of the situation that occurred and our reassurance that we are working diligently to proactively mitigate and prevent future outages.
Here’s what happened: In an effort to address network instability affecting a small segment of customers, a configuration change was applied which impacted a core router in an unintended manner — specifically, applying a traffic filter in the router towards an aggregate switch resulted in an unexpected network response. This triggered a spanning tree loop and resulted in the loss of gateway response for this critical network segment. The change was immediately reverted, but the instability remained. We worked through every segment of the network to identify the loop while attempting to minimize disruption. Our engineering team and the service vendor wrote and applied the proper Bridge Protocol Data Unit (BPDU) filters and service was methodically restored.
Network instability lasted approximately 16 hours for some customers while the servers/switches were restored to normal functionality. While we attempted to keep customers apprised as information became available, we realize we need to do more.
So, here’s what’s happening: We are in the process of modifying the current network architecture to prevent or reduce the impact of any particular network event or device failure by improving Virtual Local Area Network (VLAN) segmentation. Part of the strategy includes creating smaller sub segments of the network so we can isolate problems and manage infrastructure. We are also investing in switch upgrades to allow support of packet filtering throughout our entire infrastructure and most importantly, dedicating significant financial resources to hire additional industry experts to join our team.
Outages disrupt your life and your business. We understand – we take our responsibility to you very seriously.
Please allow me to take this opportunity to thank you for your business and provide my personal assurance that we are dedicated to meeting our commitment to you.
President & Chief Operating Officer, Endurance International Group