Monitoring the performance of network devices is key to understanding the network's needs and using the right resources to keep the network performance at optimum levels. By setting thresholds to critical performance metrics, network admins can closely monitor the performance statistics of various devices and the network as a whole and determine how network resources can be allocated to ensure peak performance.
However, as with all networking challenges, there is a downside to this as well. As much as manually setting thresholds gives the admin complete control over every device's performance metrics, configuring these can be a real task. With the admin having to know every device's performance trends as well as current statistics, it is extremely difficult to configure the thresholds for individual devices manually. And if it is an enterprise network with thousands of devices, the situation definitely takes a turn for the worse.
Why is configuring thresholds manually counter-productive?
OpManager's Adaptive Thresholding technique harnesses the power of Machine Learning to enable network admins perform this critical task easier than ever before. Using advanced predictive algorithms and percentage-based calculations, OpManager's real-time adaptive threshold quickly adapts to the constantly changing performance metrics of network devices and forecasts highly reliable values for your metrics which are then used to set thresholds for the performance monitors configured.
During threshold configuration in OpManager, the network admins usually determine the nominal value for a particular monitor of a device based on previous trends and usage patterns. The thresholds are then configured with that baseline value for three different levels of network monitoring alerts namely Attention, Critical and Trouble. This is either done on a device level, or can also be applied to multiple devices in bulk.
Now with Adaptive Threshold method, the need to study previous performance statistics is completely removed out of the equation. OpManager's advanced predictive algorithms takes over this tedious task by reading patterns in performance statistics over several time intervals and also based on multiple network usage patterns, and calculates a highly usable "Forecast" value for that monitor. These Machine Learning based Adaptive threshold requires at least 14 days of performance data to start providing forecast values. Once the data models have been established and forecast values are being provided, these values are then used by OpManager as base threshold values to control the frequency and criteria for the alerts being raised.
Once Adaptive Thresholds have been enabled, the user only needs to provide the deviation values for each criticality of alert. When the value of the particular monitor exceeds the configured deviation value for a particular criticality, an alert is raised with the corresponding level for that monitor.
For example, let's consider that the forecast CPU utilization monitor value is 70 and the Attention/Trouble/Critical deviation values are set to 10, 15, and 20 respectively. The values at which the alerts are raised in this case, are mentioned below:
Similarly, if the same monitor is currently at 70, and has been configured with Attention/Trouble/Critical deviation percentages as 10%, 15%, and 20% respectively, the levels at which alerts are raised, are:
Note: For more information on how these values are calculated, both in terms of values and percentages, please check this page.
How will adaptive thresholds make the job easier?
OpManager's adaptive thresholds take off redundant workload from network admin's shoulders with the following reasons: