Data centre cooling: taking things to the next level

811eb936-8e48-493a-b206-3a9924fa577e

Published: 30 October 2017


Dr Stu Redshaw, Chief Technology Officer at EkkoSense, explains how AI-enabled thermal optimisation is changing the data centre landscape.
 
Artificial Intelligence (AI) has been a key theme for many data centre industry events this year – and many speakers particularly have referred to how Google has been applying AI thinking from its acquisition of Deepmind to help cut total energy usage across its data centre estate by some 15%. That’s clearly a significant saving – and one that’s sounding increasingly attractive to data centre operators, especially as data centre traffic on a global scale is projected to grow at a 27% CAGR from now through 2020.
However, it’s worthwhile thinking about what people actually mean when they talk about Artificial Intelligence in the data centre. While many of the ‘hyperscale’ data centre providers such as Google, Microsoft, Facebook, Apple and Amazon are already experimenting with AI systems for energy efficiency improvements, most observers still see AI as a kind of miracle data centre infrastructure management plugin – one that can suddenly monitor, manage and optimise all their data centre or power, cooling and capacity requirements.

Too much focus on AI – not enough on Machine Learning
While theoretically achievable, this perception overlooks the fact that effective AI solutions rely on massive amounts of data and smart algorithms to be able to solve problems on their own. So when commentators are discussing AI, they often tend to de-emphasise the key process – machine learning – that makes it possible for an AI system to learn and adapt when exposed to new data.

Effective machine learning - an approach that is capable of analysing information from very large data sets, and then detecting and extrapolating patterns from that data in order to apply them to evolving scenarios – is a pre-requisite for a successful data centre AI strategy. Today’s IT platforms clearly have access to massive computing power, and consequently are ready to process complex big data sets in order to identify, analyse and act on data. However, the key question still remains: What data are they actually going to process in order to get the AI insights they require?

Identifying the right AI data sources
To keen AI watchers, this uncertainty around AI source data shouldn’t really be surprising. Only last month, the success of IBM’s Watson machine learning system was called into question due to perceived lack of progress in some of its key healthcare projects. At its heart were delays in the requirement for key machine learning source data to be ‘trained’ in readiness to support AI programmes. IBM is clearly not unaware of this challenge, and has been investing significantly in data sources to fuel its AI engine for the past few years.

So what’s the answer when it comes to AI in the data centre? And, more specifically, what should data centre cooling vendors and experts be doing to support the transition towards AI in the data centre?

Data Centre thermal performance – ripe for optimisation
Earlier this year when EkkoSense surveyed some 128 UK data centre halls – and more than 16,500 racks – we found that there was clearly a requirement for a more thermally-optimised approach to data centre management. Our research showed that, collectively, UK data centres were achieving poor levels of cooling utilisation – with an average of 66% of installed cooling equipment not actually delivering any active cooling benefits.

Perhaps of more concern was the fact that almost 8 out of 10 data centres weren’t actually compliant with best practice ASHRAE thermal guidelines that offer clear recommendations for effective data centre thermal testing. Even though just 11% of IT racks overall were outside of ASHRAE’s 18-27º C recommended rack inlet temperature range, having just one rack outside of range was effectively taking a data centre outside of most operators thermal compliance.

While ASHRAE suggests that, as a minimum, temperature data should be collected from at least one point every 3m to 9m of rack aisle, they also acknowledge that unless individual data centre racks have their own dedicated thermal sensor there realistically is no way for data centre operators to stay within target limits.

Without access to more precise monitoring technologies – and the software to manage them – data centre operations will always remain at risk from individual racks that lay outside ASHRAE’s recommended range. These are exactly the kind of performance exceptions that AI solutions would need to identify and manage for greater optimisation – but until now there has been no practical way of resolving the issue.

Laying the foundation for AI-enabled thermal optimisation
But to really get hold of the 24/7, real-time rack-level data that effective AI solutions demand, you need to be continually polling each rack for regular thermal updates, cooling performance load details and capacity information. You could collect the data manually, but realistically that’s never going to be a practical or cost-effective solution.

Given that UK data centre operators continue to invest significantly in expensive cooling equipment, I believe the cause of ASHRAE non-compliance is not one of limited cooling capacity but rather the ongoing poor management of airflow and cooling strategies. That’s why at EkkoSense we’ve been working to combine the latest 3D visualisation techniques and real-time inputs from Internet of Things (IoT) sensors to provide data centre operators – for the first time – with an intuitive, 3D real-time snapshot of their data centre environment’s physical and thermal dynamics.

By tracking rack-level temperatures using our thermal monitoring technology and applying an optimisation process, we’ve been able to restore ASHRAE non-compliant data centres to a compliant state. However, once compliant, the key - of course - is to maintain that status through a programme of regular ASHRAE audits. That’s where we see AI fitting in.

With our EkkoAir thermal sensors in place on every cooling asset, data centre operators can now track data centre cooling loads in real-time – effectively generating the thousands, then millions of granular data points that machine learning engines require to build out the next generation of data centre thermal performance models. Whether you’re an end-user organisation, a data centre operator, or a provider of IT racks or cooling equipment, the ultimate goal of any data centre AI initiative should be the same: a better balancing of thermal profiles so that only those cooling units that need to be working are actually active.

This is the game-changer that today’s data centres need if they’re to withstand the escalating duty loads anticipated for the next five years’ of data centre growth. While in the past building a cooling strategy based on nominal main plate ratings might have seemed smart, it’s increasingly looking like an outmoded approach that has systematically led to both under-cooling or over-cooling. However, thanks to innovations in low-cost sensor technology, Internet of Things connectivity, 3D visualisation and cloud deployment, it doesn’t have to be that way any longer.

We’re now working alongside AI specialists at Nottingham University to take things to the next level with an AI-enabled thermal modelling, visualisation and monitoring approach that can really take advantage of our EkkoAir-sourced data centre machine learning data. Through effective data centre thermal optimisation, we’ve already seen overall data centre cooling energy levels reduce by around 30% – however with a truly AI-led approach in place, we think there’s an opportunity for data centres to go much further.

www.ekkosense.co.uk