
In this era of rapid technology adoption and digital transformation across nearly all industries, enterprise ITOps are constantly evolving and being redefined as new models for technology deployment, scaling, and change acceleration emerge and are implemented.
AIOps has become established as a vital function within enterprise IT strategies as organizations implement intelligent automation by leveraging AI and machine learning to collect and analyze massive amounts of data, apply reasoning and problem solving, remove noise and prescribe best actions for ITOps. As AIOps itself continues to evolve, there are a couple of exciting areas that we can expect to see gather momentum in 2023 and beyond.
For observability to adaptive observability
For an organization to fully harness the power of AIOps to derive tangible and actionable business value from its data assets, there must be complete access to, and control of, all operational data, which starts with end-to-end visibility across its entire technology stack.
Observability, the ability to deduce the internal state of the system by analyzing the outputs, such as logs, metrics, and traces, that it produces, provides cross-stack visibility, and the convergence of AIOps and observability has emerged as a must-have capability in enterprise ITOps.
Prior to observability, analysis was only being done in pockets, so was effectively siloed and resulted in somewhat of a piecemeal and loosely coordinated approach to data intelligence. Combining full-stack observability with AIOps has expanded and unified the cross-layer and cross-function monitoring capabilities and granularity across any number of tools and sources for different infrastructure components. This enables anomalies that have the potential to impact a business to be detected, automatically flagged and in many instances, automatically triaged and resolved, escalated for human resolution only in cases where parameters dictate that or an automated solution cannot be readily accessed.
Bearing in mind that IT resources are limited, when conducting observability, a significant challenge can be determining the appropriate monitoring level for observing the IT estate and collecting data. For example, a very fine monitoring level can result in the collection of large volumes of data that can become unmanageable and which also carries the risk of missing genuine anomalies in the noise, while a very coarse monitoring level can lead to the collection of too little data, and can lead to incomplete diagnosis and insufficient insights.
The emergence of ‘adaptive observability’ provides an optimal middle path, making it possible, based on intelligent deep data analytics, to increase or decrease monitoring levels in response to the system health of specific IT operations. Adaptive observability integrates two very new and active areas of research, adaptive monitoring and adaptive probing, to actively assess ITOps and intelligently route and re-route monitoring levels and data gathering depth and frequency to areas where there are issues. This means that when there is an issue has been identified through AIOps and observability, the amount of data collected, relative to the issue, increases. Data is gathered with much finer granularity, and at shorter time frequencies, in order to have sufficient data attributes to be able to resolve issues quickly and efficiently. Once the issue is resolved, when things are healthy, then it’s no longer necessary to collect as much data, and at such a high frequency, and resources can be deployed elsewhere in ITOps. Analyzing the data in AIOps provides an optimized path forward, including elevating for manual human resolution if required. Adaptive observability streamlines decision-making and problem-solving, and optimizes IT resources.
Explainable AI
Another evolving area within AIOps relates to deep learning, the sub-field of machine learning which focuses particularly on the class of algorithms inspired by the structure and function of the human brain. These algorithms use Artificial Neural Networks (ANNs) to learn from large amounts of data and represent the world as a layered hierarchy of concepts.
The biggest strength of deep learning is its ability to learn complex patterns from huge volumes of data. Multiple layers of processing elements, better ability to utilize large compute power, and improved data training procedures are collectively empowering deep learning algorithms to do so. One challenge with deep learning, however, is that it has essentially presented a ‘black box’ closed solution, in that it is often not apparent how a deep learning algorithm arrived at a decision. Organizations are being asked to simply trust that the intelligence is correct.
As companies look to embed greater levels of deep learning into their data management systems, ‘explainable AI’ has grown in importance, providing a transparent ‘white box’ that is driving greater AIOps adoption.
So, why is explainable AI important?
The open nature of explainable AI enables users to better understand how an AI model chooses options and to identify potential sources of error. Explainable AI gains the trust of SMEs and can truly augment their intelligence and expertise.
Understanding the inner workings of the algorithms is important, especially when working on critical applications in, for example, healthcare, where there needs to be visibility of decision-making at every juncture because oftentimes patient outcomes are at stake. Take, for example, oncology and cancer diagnoses. A false positive diagnosis could see patients having expensive and unnecessary treatment that often has significant side effects, whereas a false negative could mean patients don’t receive the treatment they need.
Explainable AI makes the reasoning behind algorithmic outputs clear and understandable to users, removing risk and ensuring that decisions can be checked and verified.