As big data spreads into every aspect of business today, IT teams face a daunting task in handling the sheer volume and complexity of the output of IT operations. In response to this, enterprise demand for AIOps is growing. AIOps uses big data and machine learning to predict, identify, diagnose and resolve IT events at a scale and speed that humans simply cannot replicate. A recent report from private equity and venture capital firm Insight Partners estimates that the market size of the AIOps platform will grow at a CAGR of 32.2% from 2021 to 2028, from approximately $2.83 billion in 2021 to $19.93 billion in 2028. That said, effective AIOps solutions don’t come out overnight. A fully baked AIOps solution is the result of a recipe perfected over time through robust experimentation with three essential ingredients: data, analytics, and diverse domain expertise.
Successful AIOps simply cannot exist without data. This ingredient is critical and while it is available in abundance, the challenge is to collect the data in a usable and validated form. AIOps relies on hundreds – or even thousands – of data points from a variety of sources (e.g., network performance, business systems, and customer support) all generated per second and in many cases sub-second. How that massive amount of data is handled can make or break an AIOps solution. For speed, cost-effectiveness and maximum efficiency, a split pipeline of on- and off-premise data management delivers the best results.
A traditional single on-premises data processing model is no longer suitable for the complexity and volume of today’s data sets. Instead, consider building or redesigning the data processing funnel in two parts: a lean pipeline of high-speed processing that runs through a real-time, on-premises data bus to handle time-critical analytics, and a more robust channel that transfers the remaining data. in the cloud. By minimizing on-premises data production and allocating the cloud – armed with elastic computing and more advanced storage capabilities – to process the rest of the data, faster and more cost-effective data synthesis is possible.
A split pipeline model that simultaneously manages data on and off the premises can increase an organization’s ability to process millions of data points per hour. ML algorithms can help prioritize incoming data from any pipeline and turn the raw, unstructured data into actionable metrics essential to customer service reps or IT teams. The efficiency and speed of a two-tier system also enables organizations to leverage enhanced monitoring capabilities for real-time visibility and long-term trend information on network performance.
The second essential ingredient to the success of AIOps is analysis. Analytics can be integrated into the AIOps mix in two phases, including exploratory analysis – searching raw data for trends or anomalies that require additional research – and advanced statistical analysis, which turns it into actionable insights. As data flows through the pipelines, engineering teams often eagerly pursue advanced statistical analysis despite the integral role of exploratory research. Bypassing this first stage can lead to data overfitting – injecting bias into the AIOps process and mistakenly identifying issues that would render AI/ML algorithms useless and have unintended operational consequences.
Exploratory analysis relies on both ML and data scientists to identify and determine the specific metrics essential for customer service reps and technicians. IT teams may prefer ML in this process – it’s exciting technology that seems efficient. But ML alone is not always the most effective method of analysis. ML tries to solve a particular problem based on a set of specific parameters. Engineers program ML algorithms based on the metrics they think they need to arrive at conclusions A, B, or C, disregarding other possible solutions or statistics.
Conversely, statisticians and data scientists examine raw data without a specific result in mind, instead judging the numbers for patterns or anomalies. Manual data assessment, while tedious, allows experts to identify simple IT solutions that don’t require advanced statistical analysis. For example, due to complaints about wireless network performance, an analytics team combed through interactive data visualizations on a dashboard to discover that the problem locations were all on the same wireless carrier. From there, they deduced that those sites all had the same wireless modem hardware model. Finally, they found that the problem occurred when using a specific wireless band. The issue was a known issue with the wireless provider and was resolved by replacing the modem with a different model.
When teams are confident that the trends or anomalies identified in the exploratory phase are correct, they can move on to advanced statistical analysis and training AI/ML algorithms. Even AI/ML requires trial-and-error testing and will not yield immediate results. Behind every AIOps solution is a team of domain experts who extensively and constantly adapt and test AI/ML models to ensure the success of AIOps.
Various domain expertise
The third ingredient for a successful AIOps implementation is domain expertise. In the case of AIOps creation, there cannot be too many proverbial chefs in the kitchen. Successful implementation of AI in any enterprise requires the involvement of a diverse set of domain experts. For example, in the field of network operations, network engineers understand the nuances of ML systems and the AI algorithms needed to accurately solve a given problem. Meanwhile, non-technical experts bring in sector-specific knowledge such as dataset source and usability, business strategy and operations. An in-depth bank of domain experts ensures the AI/ML algorithms mirror real-world operations, provides critical validation of the results, and serves as a key check for misappropriations or unintended consequences. For example, a communications system undergoing scheduled maintenance may exhibit behavior (such as extremely low network traffic) that typically indicates a problem state. Adding a business logic layer that communicates with a maintenance ticketing system to the model predictions eliminates these false alarms.
Domain experts play an important role in the hypothetical kitchen, but also in the theoretical dining room, where they can interpret for an audience of executives hungry for AIOps solutions. ML tends to operate in a black box, which means that teams are unable to formulate the recipe by which the model arrived at a specific decision. This can lead to skepticism and hesitation among business leaders to take an action based on an AI-driven insight. On the other hand, explainable AI drives stronger buy-in and trust from business leaders unfamiliar with AIOps.
AIOps requires three basic ingredients, but as with any recipe, the quality of those ingredients and in whose hands they are placed will make all the difference in the outcome. As with the best chef creations in the world, trial and error is part of the process, especially in the complex art of training the ML. By ensuring proper data processing, using the right type of analysis, and engaging domain experts, companies can provide a successful, scalable AIOps solution to meet the increased need for operational efficiency.
Frank Kelly, vice president at Hughes Network Systems, LLC (HUGHES), is the chief technology officer of the North American division, responsible for identifying innovation and technology to improve service effectiveness and efficiency for consumer and business services. In this capacity, he oversees the strategic direction and implementation of machine learning and artificial intelligence, in addition to applying agile development and service delivery techniques and integrating DevOps technologies into Hughes services. mr. Kelly received a master’s degree in information technology from Hood College, Maryland, with a focus on network management. He also holds a Bachelor of Science Degree in Computer Science from the University of Maryland.