Anomaly detection is a crucial task in monitoring the performance of various systems. In this blog post, we will discuss how to use Kusto Query Language (KQL) to detect anomalies in CPU performance data.
Spikes
One of the most common types of anomalies is spikes in the data. Spikes occur when the data deviates significantly from its normal behavior. To detect spikes in CPU usage over time, we can use the following KQL query:
let window = 24h; Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" | where TimeGenerated > ago(window) | summarize avg(CounterValue),stdev(CounterValue) by bin(TimeGenerated, 2h), Computer | where (avg_CounterValue - avg_CounterValue) > 3 * stdev_CounterValue
This query first filters the data to include only CPU usage data and only the data that is within the last 24 hours. It then groups the data by time window and computer, calculates the average and standard deviation of the data, and finally filters out any data points that are more than 3 standard deviations away from the average.
Outliers
Another type of anomaly is outliers. Outliers are data points that are significantly different from the rest of the data. To detect outliers in CPU usage across different machines, we can use the following KQL query:
Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" | summarize percentile(CounterValue,75) by Computer | where percentile_CounterValue_75 > 50
This query filters the data to include only CPU usage data, calculates the 75th percentile of the data for each computer, then filters the results and only show the computers that have 75th percentile values higher than 50.
Changes over time
Finally, another type of anomaly is changes in the data over time. To detect changes in CPU usage over time, we can use the following KQL query:
let window = 7d; Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" | where TimeGenerated > ago(window) | summarize avg(CounterValue) by Computer, TimeGenerated = startofday(TimeGenerated) | join ( Perf | where ObjectName == "Processor" and CounterName == "% Processor Time" | where TimeGenerated > ago(window) | summarize arg_min(TimeGenerated, CounterValue) by Computer, TimeGenerated = startofday(TimeGenerated) | where TimeGenerated < TimeGenerated | project Computer, TimeGenerated, CounterValue ) on Computer, TimeGenerated | extend diff = avg_CounterValue - CounterValue | where abs(diff) > 10
This query filters the data to include only CPU usage data and only the data that is within the last 7 days. It then groups the data by day and computer, calculates the average of the data, and finds the difference between consecutive days’ averages. The query finally filters out any data points where the difference is greater than 10.
Summary
In this blog post, we have discussed how to use KQL to detect different types of anomalies in CPU performance data. These queries can be customized and adjusted to fit the specific needs of your system and can be a valuable tool in monitoring and maintaining the performance of your systems. Anomaly detection can be complex but is also very powerful.