I just walked into a lamp post…why didn’t you warn me?! – Predict, don’t react to avoid IT infrastructure issues

I have just spent the last few weeks pulling together a survey on the functionality of vendor solutions in the hybrid IT infrastructure management and monitoring space. The results have been incorporated into a White Paper that looks at the importance of knowing how well your IT is supporting the performance of your business-critical applications and services in today’s containerised multi-cloud environment.

It has become clear that the increasing reliance on IT to deliver new digital business models and the complexity of underlying IT infrastructures makes the use of automation a must for IT Operations departments. There is just too much data from too many devices and too little time to react for manual operations to be effective moving forward. Automation assumes a level of machine learning (ML). Given that there is a, sometimes wilful, misunderstanding of the difference between Machine Learning and Artificial Intelligence (AI), I perhaps shouldn’t be surprised that the term AIOps covers a multitude of sins.

All the vendors who responded to the research request demonstrated a level of ML capability in their solutions. From a practical point of view this means that all of them can reduce the Mean-Time-To-Resolution (MMTR) of IT infrastructure problems and identify root cause analysis far faster and more accurately than in the past. That is a bit like being told that I have just walked into a lamp post because I was tweeting on my phone as I walked along the pavement. It might mean I learn not to do that again, and the only other implication is embarrassment, but for an organisation whose business model relies on IT, the implications would be much more severe.

What I really needed was an alert that told me to look up and turn left to avoid hitting the lamppost and then automatically remediate the situation by disabling my social media usage while I walk. This needs more than simple ML. I am not going to try and gauge the level of deep learning and neural network capability that infrastructure monitoring and management solutions need to incorporate to achieve this. Suffice it to say not all vendors said they could predict problems and remediate them automatically. Even fewer reported having deep learning capabilities.

Whether you need that level of prediction and automated problem remediation is down to your ability to understand and accept the risks to financial performance and reputation from poor IT performance or outages. Just be aware that, just because a solution is categorised under an AIOps heading, it doesn’t mean it has that predictive capability.