Data Center Redundancy Explained

Written by Drew Leonard

November 24

In today’s fast moving information landscape, organizations can’t afford to have their essential systems go dark when they need them most. The need for greater reliability has pushed many companies out of their insufficient on-premises data solutions and into more robust colocation and cloud infrastructure. This higher level of reliability is made possible by data center architecture that puts a heavy emphasis on redundancy.

What Is Data Center Redundancy?

Data center infrastructure is designed to be highly resilient in order to withstand disruptions that could impact business continuity and data availability. That means key backup systems are in place to prevent system downtime in the event of unexpected events like equipment failure and power outages, as well as ensuring uptime during maintenance events.

Redundancy is critically important for colocation data centers that operate under strict service level agreements (SLAs) that guarantee a minimum level of uptime availability to their customers. The same criteria applies to cloud providers, which may not be hosting customer infrastructure, but are still committed to making workloads and applications available. When a data center lacks proper redundancies, it experiences more downtime, which quickly translates into financial losses through SLA payouts and customer churn, as well as a damaged reputation.

How Data Centers Measure Redundancy

Whether they provide colocation or cloud computing services, all data centers use a common system to measure infrastructure redundancy. To understand how the system works, it helps to begin by focusing on its basic unit of uptime measurement, which is represented by the variable “N.”

What Does “N” Mean?

When referring to data center redundancy, “N” represents the amount of infrastructure needed to operate all critical systems at full capacity. Although it can be used to refer to different aspects of infrastructure, such as cooling units or uninterruptible power supplies (UPS), it’s important to keep in mind that N always stands for whatever number of that particular component is needed for the facility to operate as intended. (Note: N represents no redundancy. This is a single corded approach with no backup power built in. If there is a utility outage or critical systems failure then the customers will go down.)

To give a very simplified example of this, imagine a data center needs four A/C units to provide adequate cooling for its equipment. In this case, N would represent four, or the baseline capacity needed to keep maintain the proper operating temperature range in the facility. Taking the example a step further, a data center with only N cooling infrastructure would have no redundancies in place to protect it from failure. If a cooling unit were to fail, there would be no standby units or additional cooling capacity needed to cover for the failing unit. Temperatures would increase in the facility.

What Does N+1 Redundancy Mean?

Taking this logic a step further, N+1 means that a facility has everything it needs to run at full capacity plus one additional component to serve as a backup. Referring to the previous example, N+1 cooling infrastructure would mean that our hypothetical data center has five A/C units (remember, N=4 in this example). That means if one of the units failed for any reason, another would be available to take over until it could be repaired. It also means that if one of the units had to be taken offline for maintenance, the backup could cover for it and keep the system running without any interruptions in service; concurrent maintainability.

Of course, in reality, this example would be a bit more complicated. Standard data center redundancy practices typically require an extra unit for every four that are actually needed. So if the previous example is scaled upward, it would mean that a facility that requires 16 A/C units and features N+1 cooling redundancy would actually have 20 A/C units in place.

A technology stack with N infrastructure is a highly vulnerable system with a single point of failure. Many organizations that maintain on-premises data centers fall under this category, especially smaller companies that don’t have a dedicated data center. As more businesses come to depend upon their networks, applications, and data to deliver goods and services, colocation facilities and cloud providers with N+1 infrastructure in place are becoming much more attractive solutions.

What Does 2N Redundancy Mean?

While N+1 redundancy delivers acceptable protection for most organizations, there are some businesses that require even more robust infrastructure. That’s why some healthcare and financial clients try to find facilities that offer 2N redundancy. Occasionally referred to as N+N, a 2N system doesn’t supply a backup for each individual component. Instead, it represents a completely independent, mirrored system that is capable of taking over operational needs if the primary system were to go offline.

Using the above example, if an N facility had four cooling units, upgrading to 2N would give it eight. More accurately, it would give it two sets of four. If one or more of the primary system’s units go down, the secondary system would take over. This type of redundancy is often called fault-tolerant because it provides completely uninterrupted service. Note: fault tolerant is actually any system with redundancy built in. The goal of resiliency is to not realize any downtime. Power UPS systems will maintain battery power to the critical infrastructure until the generators can take over. Cooling can be transferred to standby systems before the temperature increases beyond the range of the SLA. Etc.

But having a completely mirrored backup system is very costly to install and labor intensive to maintain. They require specialized hardware and software that can immediately detect faults and keep the systems running in tandem, all of which needs to be carefully managed. The expense is often worthwhile for applications that simply cannot afford any downtime, however. That’s why 2N systems are attractive to industries like healthcare and finance, where even a second or two of downtime could cost lives and have heavy financial impacts.

What is 2N+1 Redundancy

Typically the highest form of data center redundancy, a 2N+1 system combines the paralleled aspects of 2N systems with the flexible backup capabilities of N+1 systems. Although more expensive to build and deploy, they offer the most flexibility because a component failure doesn’t necessarily require the facility to shift over to the secondary system. A 2N+1 system ensures that a data center will be able to keep its infrastructure up and running under almost any circumstances. If service at a 2N+1 data center is knocked out, then a system outage is probably the least of anyone’s problems.

Choosing the Right Data Center Redundancy

When evaluating colocation and cloud data center services, organizations need to pay close attention to the level of redundancies in place for power and cooling. Redundancies for power and cooling may be different within a data center, so it’s important to understand the risks from each to your business operations. Most organizations will get all the reliability they need from N+1 redundancies. However, as noted previously, some data-critical industries require zero-risk environments to ensure continuous access to their data and applications. These companies are best served with 2N or 2N+1 infrastructure when it’s available to them.

Evoque data centers are built on a foundation of strength to deliver a robust, reliable framework for business continuity. Our most advanced facilities feature a combination N+1 and 2N infrastructure to bolster our industry-leading SLAs and ensure that your mission critical systems are available when you need them most. To learn more about our commitment to data center resilience, talk to one of our solutions experts today.