AI’s Impact on Data Centers: AI & Performance Monitoring

AI Data Center Performance Monitoring

March 18

According to Gartner, “30 percent of data centers that fail to sufficiently prepare for AI will no longer be operationally or economically viable by 2020.” With that milestone upon us, artificial intelligence (AI) is now — or should be —part of a comprehensive and modern approach to data center maintenance and operations. 

When selling power and cooling, every operational efficiency that can be gained presents an opportunity for cost savings and infrastructure optimization. For data centers, that means: 

  • Having the right equipment 
  • Maintaining that equipment in the most effective and efficient way possible 
  • Balancing resources exactly as needed, minimizing waste 

Balancing real-time need across an entire portfolio is challenging, particularly when also accounting for environment- and location-based variables. AI advances have reduced those challenges, allowing data center operators to more quickly and completely understand performance-based operations and adjust supply to true demand. The machine learning aspect of AI has proven particularly beneficial for data centers. With machine learning, systems are not limited to reactionary responses but can use past trends to predict need and adjust preemptively. These tools allow data center teams to confidently employ predictive maintenance and dynamic optimization for greater efficiency. 

“Using AI to create smart infrastructure will help deliver a much more efficient data center, optimize configuration and enable better workload execution with dynamic settings and adaptive capabilities.”

Afcom 

AI & Performance Monitoring  

Performance monitoring and optimization has been a goal for data centers since their inception. Understanding the expected and actual performance of equipment allows data centers to: 

  • Anticipate equipment longevity 
  • Plan for regular maintenance and replacement 
  • Optimize equipment for peak performance 
  • Identify when equipment is not performing as expected 

When monitored effectively, these factors help data centers save money and provide constant optimal performance. 

Before AI 

Without AI, performance monitoring can be a labor and time intensive endeavor that requires a fair amount of educated guess work. For example, to monitor the performance of a cooling unit, engineers determine the normal motor amperage draw by observing units over many years. Once the average is identified, that metric is used to judge performance of other units. If the amperage is too low or too high, it’s an indication that something is not right.  

Monitored this way, this metric has little benefit other than telling techs if a particular unit is performing as expected. No additional diagnostics are provided. If equipment models are updated, the process must start over as averages may change. With a range of different units used across a data center’s portfolio, identifying a reliable average is difficult. Location and other environmental factors can also impact the performance average with no way for techs to easily account for these variables. 

Because it is not easy to factor outside variables into performance monitoring (making it difficult to identify commonalities in equipment failure or real impacts on longevity), most data centers rely on manufacturer recommended maintenance and replacement schedules — regardless of whether maintenance or replacement is actually required by each individual unit. If a unit fails early, data centers can find themselves scrambling to replace components before performance or service are negatively impacted. Working on an outside schedule rather than being driven by need costs data centers hundreds of thousands of dollars over time. But without better analytics and insights, data centers have no other choice. 

What AI Can Do for Data Center Performance Monitoring 

With AI, the process of performance monitoring is much faster and can account for equipment, location, and other variables. To continue the cooling unit motor example, AI can to easily track and trend the motor amps of 50 cooling units without additional effort for engineers. This allows teams to determine the normal operating condition of the motors in less time.  

Expanding AI-based performance tracking to a data center’s entire portfolio means teams can easily obtain and track performance metrics customized to each data center environment. Input from additional sensing devices add valuable contextual information, painting a more accurate picture of the cooling unit’s average performance expectation. Additional data inputs easily monitored by AI include: 

  • Temperature in and out 
  • Outside air conditions  
  • Humidity  
  • Mass flow rate of water and air  
  • Pressure and pressure differential indicators  
  • Stress or Strain cells (force)  
  • Accelerometers (vibration)  
  • Current, Voltage, Resistance, Impedance in  
  • Water chemistry  
  • Battery performance or load test  
  • Generator exhaust opacity and chemistry  
  • Oil and water chemistry levels  
  • Internal load that needs to be cooled  

With this level of insight, the motor amperage is known for so many operating conditions that engineers no longer need to look at or touch the machine to know when it is not performing. They can tell almost immediately when the slightest change in performance occurs. With machine learning, teams may even be alerted before a potential issue arises based on AI’s trending capabilities. 

Benefits of AI on Data Center Performance Monitoring 

The level of insight provided by AI has moved data center performance monitoring beyond the simple “performing as expected/not performing as expected” metric. Constant monitoring coupled with machine learning means teams not only know how each piece of equipment is performing, but can predict when the unit will fail with incredible accuracy. This deep-level, proactive approach to performance monitoring can have a major impact on both data center performance and maintenance costs. 

With the ability to predict equipment maintenance needs with an extremely high level of accuracy, data centers no longer need to perform maintenance “as scheduled.” Teams can now wait until monitoring detects that a piece of equipment is (or is about to be) functioning outside of normal operational range. With AI solutions, this is often done when the equipment itself sends a message via exception reporting.  

When this methodology is applied to every piece of equipment with commonality, teams have a data set which allows them to perform as-needed or just-in-time maintenance. Taking a needs-based approach can save data centers hundreds of thousands of dollars by not performing unnecessary maintenance or risking downtime caused by an unexpected failure. AI insights can also help teams understand what caused the performance degradation so the issue can more quickly be addressed or mitigated. Ultimately, the ability to confidently adopt a “run to failure” mentality allows teams to improve overall reliability and availability to unprecedented levels.  

Treating Data Center Equipment Like Wine  

Wine enthusiasts know that many elements impact the quality of wine. A California cabernet sauvignon will not be the same as a French cabernet sauvignon. Even buying the same style of wine from the same vinter each year does not guarantee it will taste the same. Things like terroir, annual environmental conditions, and aging play a big role in how a wine tastes, effecting each vintage differently.  

Having minute performance knowledge of various equipment means data center operators can now define each unit like a bottle of wine, designating various vintages of equipment as good or bad years based on manufacturer. The level in insight provided by AI enhances a team’s understanding of equipment by accounting for things like environmental factors and real performance trends in addition to straight manufacturing specs. 

Assessing data center equipment this way allows teams to not only select equipment with predictability, but to purchase grey market (used) equipment with the same level of predictability as buying new. Teams will know which “vintage” of secondary equipment is still reliable versus which vintages are likely to cause issues or fail sooner. The location the equipment is coming from, where it will be placed, and anticipated load can also be factored in to ensure it’s the right fit. The incredibly detailed insights provided by AI allow data centers to accurately and confidently assess the reliability of equipment and plan accordingly. 

Keeping Up with the Times 

Artificial intelligence and machine learning have improved data center operations by providing an unprecedented level of performance insight and the ability to predict events before they become issues. This level of performance and equipment insight where pipe dreams when data centers first emerged, but are quickly becoming highly-integrated elements of doing business. As client and consumer demand for always-on, low latency access to data increases, the level of performance monitoring and prediction offered by AI will become table stakes for colocation providers and their clients.