The Altoona data center is cooled by 100 percent outside air and powered by 100 percent renewable energy, the latter due to a wind energy project in nearby Wellsburg.
This is the fastest we’ve ever completed a first building at one of our sites, and we owe a lot of that to the people of Iowa. More than 950,000 hours have already been logged in the construction of the facility, and we have an average of 450 people — 80 percent of them from central Iowa — here every day, constructing a second data-center building on the site. As you may have heard, we like to move fast at Facebook — and we are grateful to everyone who’s helped us get to this point. We’re proud to call you our neighbors and our friends, and to be a part of the community here in Altoona.
We’re also proud of what we’ve built together — one of the most advanced, efficient and sustainable data centers on the planet. Like our other data centers, Altoona is cooled by 100 percent outside air, and it features the latest in hyperefficient Open Compute Project gear. Altoona is also our first data center to take advantage of our innovative new networking fabric, which will help us scale much more efficiently as more and more people connect on Facebook around the world. And last but not least, Altoona will be powered by 100 percent renewable energy (as tracked by renewable energy certificates), thanks to the new wind project we worked with MidAmerican Energy to develop in nearby Wellsburg, Iowa. In addition to bringing more jobs and investment to the area, Wellsburg adds 140 megawatts of new renewable energy to the grid in Iowa — more than what our data center will require for the foreseeable future.
Facebook said its new data-center fabric is a novel, “disaggregated” approach to networking where the data-center network is broken up into small, identical server pods with uniform connectivity, adding that this modular approach allows for a dramatic increase in network capacity and scalability — especially important for the social network’s large volumes of “machine-to-machine” traffic.
Alexey Andreyev offered more details on the company’s new networking fabric in a post on its engineering blog.
Our previous data-center networks were built using clusters. A cluster is a large unit of deployment, involving hundreds of server cabinets with top-of-rack switches aggregated on a set of large, high-radix cluster switches. More than three years ago, we developed a reliable layer-three “four-post” architecture, offering 3+1 cluster switch redundancy and 10 times the capacity of our previous cluster designs. But as effective as it was in our early data center builds, the cluster-focused architecture has its limitations.
For our next-generation data-center network design, we challenged ourselves to make the entire data center building one high-performance network, instead of a hierarchically oversubscribed system of clusters. We also wanted a clear and easy path for rapid network deployment and performance scalability without ripping out or customizing massive previous infrastructures every time we need to build more capacity.
To achieve this, we took a disaggregated approach: Instead of the large devices and clusters, we broke the network up into small identical units — server pods — and created uniform high-performance connectivity between all pods in the data center.
There is nothing particularly special about a pod — it’s just like a layer-three micro-cluster. The pod is not defined by any hard physical properties; it is simply a standard “unit of network” on our new fabric. Each pod is served by a set of four devices that we call fabric switches, maintaining the advantages of our current 3+1 four-post architecture for server rack TOR uplinks, and scalable beyond that if needed. Each TOR currently has 4 x 40G uplinks, providing 160G total bandwidth capacity for a rack of 10G-connected servers.
What’s different is the much smaller size of our new unit — each pod has only 48 server racks, and this form factor is always the same for all pods. It’s an efficient building block that fits nicely into various data-center floor plans, and it requires only basic mid-size switches to aggregate the TORs. The smaller port density of the fabric switches makes their internal architecture very simple, modular, and robust, and there are several easy-to-find options available from multiple sources.
Another notable difference is how the pods are connected together to form a data-center network. For each downlink port to a TOR, we are reserving an equal amount of uplink capacity on the pod’s fabric switches, which allows us to scale the network performance up to statistically non-blocking.
A large fabric network — which has a more complex topology and a greater number of devices and interconnects — is definitely not the kind of environment that can be realistically configured and operated in a manual way. But the uniformity of the topology helps enable better programmability, and we can use software-based approaches to introduce more automation and more modularity to the network.
To automate the fabric, we’ve adjusted our thinking to be more “top-down” — holistic network logic first, then individual devices and components second — abstracting from individual platform specifics and operating with large numbers of similar components at once. We’ve made our tools capable of dealing with different fabric topologies and form factors, creating a modular solution that can adapt to different-size data centers.
In dealing with some of the largest-scale networks in the world, the Facebook network engineering team has learned to embrace the “keep it simple, stupid” principle. By nature, the systems we are working with can be large and complex, but we strive to keep their components as basic and robust as possible, and we reduce the operational complexity through design and automation.
Our new fabric was not an exception to this approach. Despite the large scale and complex-looking topology, it is a very modular system, with lots of repetitive elements. It’s easy to automate and deploy, and it’s simpler to operate than a smaller collection of customized clusters.
Fabric offers a multitude of equal paths between any points on the network, making individual circuits and devices unimportant — such a network is able to survive multiple simultaneous component failures with no impact. Smaller and simpler devices mean easier troubleshooting. The automation that fabric required us to create and improve made it faster to deploy than our previous data center networks, despite the increased number of boxes and links.
Our modular design and component sizing allow us to use the same mid-size switch hardware platforms for all roles in the network — fabric switches, spine switches, and edges — making them simple “Lego-style” building blocks that we can procure from multiple sources.
With smaller device port densities and minimized FIB (forwarding information base) and control plane needs, what’s started as our first all-around 40G network today will be quickly upgradable to 100G and beyond in the not so distant future, while leveraging the same infrastructure and fiber plant. With the first iteration of the fabric in our Altoona data center, we already achieved a 10 times increase in the intra-building network capacity, compared with the equivalent cluster design, and we can easily grow to over 50 times within the same port speeds.