
Artificial intelligence for IT operations (AIOps) is a growing trend. A Gartner survey found that AIOps adoption will see an annual growth rate of 15 percent through 2025. However, software developers must overcome several challenges to make the most of an operational and culture shift to AIOps.
Sean McDermott, CEO and founder of Windward Consulting, provides insights into the forms that AIOps implementations are taking, potential pitfalls, and drivers moving this strategy forward.
What is the status of AIOps adoption today?
McDermott: AIOps is definitely gaining tracking, but it is important to understand what AIOps is and how broad the definition can be. We view AIOps as a “strategy,” with the infusion of machine learning into all aspects of the IT operations experience. However, there are many AIOps “platforms” on the market, and the use of AIOps as a term is increasingly in marketing material from many vendors. All this said, particular use cases for AIOps adoption are accelerating rapidly, such as infrastructure and application monitoring, event management, correlation and root-case analysis, and incident management. These are all massive data processing use cases, so the application of AIOps and machine learning makes sense.
Is there a “typical” AIOps implementation or certain elements that are always present?
McDermott: At this point, the “typical” implementation is using AIOps “on top” of existing infrastructure and application monitoring tools to ingest large amounts of data and process events using machine learning for root cause analysis and correlation. Most of the applications for AIOps are for understanding relationships between events and predicting infrastructure failure points or degradation of service from applications. The key element is access to data sources and the methods for accessing data, whether through APIs or third-party monitoring tools. Data structure is also a key element needed for a successful AIOps deployment.
What are some of the benefits driving AIOps adoption?
McDermott: The biggest ROI now is around service availability and critical service uptime. IT Operations organizations are being overwhelmed with data from very complex IT environments and need assistance in managing uptime and availability. Either IT leaders will have to invest in more staff, or they will have to start deploying more innovative technologies like AIOps (e.g., machine learning) to process inbound events. AIOps is also very important to looking at massive amounts of data and predicting services that may be degrading in performance, and providing guidance to operators on corrective actions before service is interrupted. Some organizations are using AIOps as an automation platform for corrective actions, but that is limited for now. However, automated corrective action will increase as capabilities increase and organizations trust the AIOps platforms more.
What role does data quality play in a smooth implementation – or in creating barriers to AIOPs overall?
McDermott: Data quality plays a critical role in a successful AIOps implementation, as does creating common data structures such that AIOps models can consume the data. Without data, AIOps does not provide much value. So, finding the right data for the use cases at hand and integrating it into other data sources for context (such as CMDB and service architectures) are critical for AIOps to deliver on the promised value.
What is the best way to integrate AIOps with existing processes?
McDermott: This question comes up repeatedly with our clients, and the best answer is “it depends.” This primary issue is that many current processes may not be optimized and could be highly inefficient. Certainly, it would not be good practice to apply AIOps and associated machine learning to a less than optimal process. The best approach is to start at the use case, look at the current processes used to support this use case and evaluate where certain areas can be improved or even removed by use of AIOps. For example, if part of the process is to get everyone involved in a Priority One outage (which is common), an AIOps model may have provided enough data to focus on two areas of the infrastructure most likely to be the culprit and bring those groups together to collaborate, saving time of other groups that might not play a role in that particular outage.
Although there are tech hurdles to overcome, do businesses also have to consider their teams’ culture and encourage buy-in to make AIOps a success?
McDermott: Ultimately, organizational structure, culture, and change management are going to be key factors in a long-term AIOps strategy. Now, with the use cases limited, this is not as much of an issue. However, there are perceptions now of the impact of AI and machine learning from the general societal perspective that can creep into an IT organization. Fear of AIOps can create passive pushback from the organization. AIOps will certainly change the dynamic of the organization by automating manually intensive and repeatable tasks but will create many opportunities for data scientists, engineers, and developers. Leaders need to create and continually articulate the long-term benefits of AIOps to the organization.