AIOps 2.4 - Performance Test Report

 

Overview

 

ITOPs 2.4 release focuses on new features like

 

Performance Testing – Workload Parameters

The following parameter define the workload of performance testing of ITOps using Queue channel.

 

SYSTEM WORKLOAD PARAMTERS

 

 

 

Total sample Alerts

11,100

Test Duration

1 day, 8 hours, 47 minutes

Monitoring Agent type

SolarWinds

Type of Alerts

UP, DOWN, WARNING, CRITICAL

 

Test Environment - Infrastructure setup

ITOps components, IHub, Clones, Databases & Message Broker are hosted in Kubernetes environment in Azure cloud.

Performance Testing is conducted with the following configuration of Kubernetes Cluster hosted in Azure. The Kubernetes ITOps cluster comprises of 2 different types of node pools such as Application Node Pool, Persistent Node Pool.

MYSQL is hosted as an Azure Managed Service.

The below table summarizes the hardware configuration & Number of VM & components hosted in each Node Pools.

Picture 639

 

 

Picture 641

 

Performance test environment setup

To replicate a production-based Infrastructure environment IHub Lite & SolarWinds based mock service are hosted in an Azure VM hosted at Azure cloud in a different virtual network.

IHub main along with ITOps components are hosted at Kubernetes environment in Azure cloud in another virtual network.

 

Picture 823

 

 

Performance Testing - Software Used

Following Tools used as part of performance testing

 

Tools used

Version

Description

Grafana

 

Dashboard to view resource utilization

Microsoft Excel

 

Analysing test results and reports

ITOps

2.4

 

 

Performance Testing – ITOps components & Replicas

The below table provides details of docker containers of different components associated with ITOps such as Clones, ITOps components, Clones, iHub, Database & Message Broker.

This also provides detail of components identified for Autoscaling and the criteria defined for Autoscaling. As part of ensuring availability of non-scalable components to avoid failures of any components during document processing, 2 instances of each components are by default available in the Kubernetes cluster.

 

Picture 867

Following are the details of replicas configured for Database & Message Broker in Kubernetes cluster MySQL is deployed as a Managed service at Azure.

 

Picture 869

 

Autoscaling Configurations

Following are the CPU threshold limits, Replicas (Minimum & Maximum) configured for each component identified for Autoscaling in Kubernetes cluster for ITOps.

Picture 871

 

Performance Test Approach

Performance Testing of ITOps is conducted by sending system alerts & notifications captured by SolarWinds monitoring agent. A SolarWinds monitoring agent based mock service is created to simulate different types of System alerts such as UP, DOWN, Warning & Critical, alerts are simulated by the mock service.

This service is polled continuously by IHub Lite API scanner channel in a fixed interval and the polled data is pushed to ITOps Queue system from which the alerts are processed further by IHub main & ITOps. SolarWinds based mock service simulates alerts archived from a production-based system.

Test is conducted by simulating different counts of system alerts by mock service in a specific interval at different alert times. This helps in creating system alerts workload like a production-based environment.

To capture performance metrics on alert correlation, ITOps project was created with flavour as AIOPS Resolve. The project was applied with hybrid correlation.

 

Performance Test -Summary

Performance Testing of ITOps is conducted by sending SolarWinds based alerts samples simulated from a SolarWinds mock service cumulatively for a duration of 1 day 8 hours 47 minutes.

Test is conducted with SolarWinds mock service simulating 17 alerts every minute. The mock service simulates alerts archived a production system. This also helps us in verifying the functionalities of ITOps processing as the alert data archived replicates all the alert scenarios & patterns occurred in a real production system.

IHub lite API scanner channel polls the mock service continuously in an interval of 3 minutes and IHub lite channel publishes the alerts to ITOps Queue from which further processing of alerts starts.

This test report is prepared based on the metrics collected from the last 1 test execution conducted using alert data archived from a production system.

Following are the counts of alerts received at IHub Lite, IHub Queue channel & Alert store.

 

Test Execution

Total alerts received at iHub

Lite from mock service

Total alerts received at iHub

Queue Channel

Total alert received in alert store

Test 1

11100

11100

11100

 

 

Performance Metrics Captured

Alert Correlation Time

To capture the time taken for the correlation of similar and related alerts, the test is conducted by simulating 17 alerts every minute from the mock service. This type of alert simulation pattern ensures that each time the SolarWinds mock service is polled, the count of alerts received will be matching the like a production system workload.

Following are the threshold count of alerts set for correlation workflow and the duration of the schedulers for Correlation, Alert analytics & Auto closure flap cluster configured during the multiple test executions.

Alert Threshold count per execution of workflow represents the count of alerts taken by correlation workflow during each execution of the workflow as part of alert correlation.

Test 1:

Alert Threshold Count per execution of

correlation workflow

Workflow Scheduler Interval (Minutes)

Correlation

Alert Analytics

Auto Close flap cluster

40

3

-

-

 

Following are the time taken for alert correlation during the test executions conducted with above mentioned configuration of alert threshold and workflow scheduler time.

 

Test 1

Time taken for correlation of each alerts is 8.21 seconds (p90) and median time is 5.05 seconds

Correlation time per alert (seconds)

90 Percentile

8.21

Median

5.05

Average

5.05

Maximum

491.5*

*This was observed in a few alerts, the second maximum is 485.3

Following is the graphical representaion of time taken for correation of each alerts. Test 1

 

Picture 1575

 

 

Performance Test – Infrastructure Resource Usage

Following are the CPU & Memory usage of various components in ITOps during the test executions.

Application components

Clones Engine

It’s observed during test execution that CPU usage in POD’s hadn’t reached above threshold limit at any instance and as a result autoscaling of Clones engine didn’t get initiated.

The max CPU usage was only 0.94 core and max memory usage was 5.64 GiB during the entire test set execution.

POD Replica usage pattern of Clones engine during test execution is as follows

 

Picture 1577

 

Followings are the graphical representation of CPU usage of multiple POD’s of clones engine during the test execution.

 

Picture 1972

Other application components

 

The following tables shows the max CPU & Memory usage of other application components of ITOps.

Even though Auto scaling was enabled in alert correlation & alert mapping component the replica count of both components remained as 2 throughout the test execution.

Metrics

Sense-queue

IHub component

IHub Services

Correlation

Alert mapping

CPU Core (Max)

Memory (Max)

CPU Core (Max)

Memory (Max)

CPU Core (Max)

Memory (Max)

CPU Core (Max)

Memory (Max)

CPU Core (Max)

Memory (Max)

POD Resource Limit

0.5

1.0 GiB

0.5

2 GiB

0.5

2 GiB

1

1.5 GiB

1

6 GiB

Max Usage

0.005

537.57 MiB

0.083

795.09 MiB

0.009

644.55MiB

0.001

76.76 MiB

0.187

5.40 GiB

 

 

Following are the CPU & Memory usage of other application components in ITOPs

Clones Sense Queue

CPU Usage

 

Picture 1974

 

Memory Usage

 

Picture 1976

 

iHub Component

CPU Usage

 

Picture 2061

 

Memory Usage

 

Picture 2063

 

iHub Services

 

CPU Usage

 

Picture 2065

 

 

Memory Usage

 

Picture 2096

 

Alert Correlation

 

CPU Usage

 

Picture 2098

Memory Usage

 

Picture 2149

 

 

Alert Mapping

CPU Usage

 

Picture 2151

 

Memory Usage

 

Picture 2153

Following are the percentage wise use of CPU & Memory of each application components,

Max usage represents the maximum value reached in resource usage (CPU, Memory) during the test run. POD resource limit represents the max limit set in each component Kubernetes POD.

Databases & Message Broker

Below table summarizes the CPU & Memory max usage of the database & message broker components used by ITOps.

Metrics

MongoDB

Elasticsearch

RabbitMQ

CPU core (Max)

Memory (Max)

CPU core (Max)

Memory (Max)

CPU core (Max)

Memory (Max)

POD Resource Limit

0.5

7.45 GiB

1

6 GiB

1

2 GiB

POD Max Usage

0.373

5.81 GiB

0.902

10.74 GiB

0.29

355.67 MiB

 

Following are the percentage wise use of CPU & Memory of each Database & Message Broker components of ITOps.

POD Autoscaling metrics

Following are the usage of replicas of POD’s of the ITOps components identified for Auto scaling. During test execution its observed that none of the components which had autoscaling enabled had any change on the pod replica count from its minimum value.

The below table summarizes the average, Max POD replica usage of the following ITOps components identified for autoscaling.

Picture 3308

 

Performance test Observations & Recommendations

Performance test metrics comparison

Time taken for Alert correlation

 

 

ITOps version

 

Correlation Workflow Alert Threshold Count

 

Correlation Workflow

Scheduler Interval (Minutes)

Time Taken for correlating each alert

(seconds)

90 Percentile

Median

2.4

40

3

8.21

5.05

2.2.2

20

3

5.87

4.99

 

There is slight increase observed in time taken for alert correlation in ITOps 2.4

Performance test Execution results

 

Raw data collected as part of test execution is available on the following SharePoint location Test Data