"The FCAPS model of ISO lists fault management as one of the five core functional areas of proactive network management and defines its goal: to recognize, isolate, correct, and log faults that occur in the network."

Network fault management is the process of finding, isolating, and troubleshooting network faults in the fastest way possible. Fault management is a crucial component of network management that minimizes downtime and prevents device failures by resolving faults rapidly, thereby ensuring optimal network availability and preventing business losses.

Network fault monitoring is the first step of fault management and thus a requirement for successful network management. The increasing complexity of hybrid network infrastructures would make the fault management process burdensome if not for fault management systems. A fault management tool follows a four-step cycle to resolve issues:

  • Detect: Finding performance anomalies or interruptions in service delivery
  • Isolate: Locating and isolating the event to present actionable faults.
  • Alert: Notifying network admins through alarms or notifications.
  • Resolve: Fixing faults through automation or manual intervention.

How OpManager fights network faults

 Fault management- ManageEngine OpManager

Network fault management is all about staying up-to-date with what is happening in your network, be it an unforeseen outage or performance degradation. You can detect, recover, and limit the impact of failures in your network using OpManager, our 24/7 automated network fault management software. The powerful capabilities of OpManager as a network fault management system help you isolate and resolve faults in no time through a four-step workflow.

1. Detect: Be the first to capture events

OpManager's fault detection software constantly monitors networks for faults and instantly detects when there is performance degradation or a service interruption. The fault detection can be done through active and passive monitoring.

 Fault detection- ManageEngine OpManager

Active fault management detects an event by checking the device status through ICMP ping, TCP, or UDP port checks, custom scripts, remote queries, and more. This is an active approach to identifying and rectifying potential issues in real-time, sometimes even before they become a fault.

On the other hand, passive or event-based management monitors the network for actual events that indicate faults or failures only after they have occurred. This can be done through SNMP traps, syslog messages, Windows Event Log messages, and more

2. Isolate: Focus only on actionable faults

Once the problem is detected, identifying its root cause is of utmost importance to improve the resolution time (MTTR). The whole idea of this isolation process is to eliminate redundant events, thereby cutting down on proxy alerts and exhibiting only actionable faults. OpManager's network fault management system does that with the help of the three methods discussed below.

Deduplication

When an event such as high memory utilization is reported and prevails for the next 30 minutes, your tool should not generate multiple alerts by polling every three minutes for 30 minutes. In such cases, OpManager appends recurring events to alarm history, thereby eliminating duplication and preventing multiple alarms for the same fault.

Correlation

Device-dependencies:

 Fault correlation- ManageEngine OpManager

When a core router goes down, it is evident that its dependent devices will go down as well. If your fault management tool raises alarms for all those devices, the amount of time required to identify the root cause of the issue will be much greater. OpManager's device dependencies option helps you declare parent and dependent devices, thus averting such false alerts by raising a single alarm for the source device only (in this case, a core router). With the network mapping feature, admins can locate and troubleshoot issues quickly.

Root cause analysis (RCA):

 Fault correlation- ManageEngine OpManager

To narrow down the root cause of an issue, you need to compare and identify the correlation among the performance of multiple monitors. With OpManager's RCA profile, simply drag and drop the respective monitors for which you want to analyze the performance and a performance curve will be created for each. You can compare upto 20 monitors in a single window and performance graphs will be created for the selected monitors, helping you correlate and analyze performances of multiple monitors at once.

Alarm correlation in fault management

You can also use OpManager's alarm correlation rule to easily correlate metrics of essential entities and gain contextual information about your alarm patterns. This way, you can greatly reduce alarm noise and initiate first-level fault remediation measures for violations of set criteria.

Automation

Automation paves the way for faster resolution by dropping unwarranted events (such as negligible, incidental spikes), reverting the alarm status, and suppressing known alarms. The other automation that OpManager offers are:

  • Downtime scheduler: You can schedule downtime during the routine maintenance period to stop OpManager from monitoring the network and avoid dispensable alerts.
  • Pause status polling: When you are working on a particular faulty event, you can use this option to pause polling until the issue is rectified to prevent false alerts.

3. Inform: Get notified from wherever you are

Once the actionable event is isolated, OpManager's automated fault management notifies NOC admins about it through visual fault representation and notifies remote admins through trouble ticketing and alerts.

 Fault notification- ManageEngine OpManager

4. Resolve: Put right the faults quickly and easily

Not every detected fault is serious enough to require your immediate attention. In most cases, fault management systems like OpManager run designated scripts or perform Workflows at the earliest sign of trouble to automate service restoration and keep the network running. When automation does not work due to errors, OpManager escalates the alarm to the appropriate admins with the event details and the next course of action. So even when you are busy shifting locations and floors to attend to the network's needs, OpManager's fault management tool keeps some faults at bay.

 Fault management- ManageEngine OpManager

In some cases, such automated resolutions are not possible, so manual intervention is required. You can perform troubleshooting to assess the damage and work out possible quick solutions using the interactive, built-in, web-based troubleshooting tools.

Why you need OpManager

"According to a survey conducted by Gartner, the average cost of network downtime for enterprises is around $5,600 per minute, which is over $300,000 per hour on average and up to $540,000 per hour on the high end."

With downtime having such great potential to cause huge losses for businesses, it is essential to take the necessary actions to prevent or minimize it. Preventing downtime and maintaining network uptime comes down to monitoring and managing network faults effectively. An advanced, automated fault management solution like ManageEngine OpManager helps admins resolve faults fast, protecting network availability and business revenue.

Keep your network fault-free with OpManager.

Download 30-day free trial

Customer reviews

OpManager
OpManager - 10 Steps Ahead Of The Competition, One Step Away From Being Unequalled.
- Network Services Manager, Government Organization
Review Role: Infrastructure and OperationsCompany Size: Gov't/PS/ED 5,000 - 50,000 Employees
"I have a long-standing relationship with ManageEngine. OpManager has always missed one or two features that would make it truly the best tool on the market, but over it is the most comprehensive and easy to use the product on the market."
OpManager
Easy Implementation, Excellent Support & Lower Cost Tool
- Team Lead, IT Service Industry
Review Role: Infrastructure and OperationsCompany Size: 500M - 1B USD
"We have been using OpManager since 2011 and our overall experience has been excellent. The tool plays a vital role in providing the value to our organisation and to the customers we are supporting. The support is excellent and staff takes full responsibilities in resolving the issues. Innovation is never stopping and clearly visible with newer versions"
OpManager
Easy Implementation With A Feature Rich Catalogue, Support Has Some Room For Improvement
- NOC Manager in IT Service Industry
Review Role: Program and Portfolio ManagementCompany Size: 500M - 1B USD
"The vendor has been supporting during the implementation & POC phases providing trial licenses. Feature requests and feedback is usually acted upon swiftly. There was sufficient vendor support during the implementation phase. After deployment, the support is more than adequate, where the vendor could make some improvements."
OpManager
Great Monitoring Tool
- CIO in Finance Industry
Review Role: CIOCompany Size: 1B - 3B USD
"Manage Engine provides a suite of tools that have made improvements to the availability of our internal applications. From monitoring, management and alerting, we have been able to peak performance within our data center."
OpManager
Simple Implementation, Easy To Use. Very Intuitive.
- Principal Engineer in IT Services
Review Role: Enterprise Architecture and Technology InnovationCompany Size: 250M - 500M USD
"Manage Engine support was helpful and responsive to all our queries"
 
 

Case Studies - OpManager

OpManager

Hinduja Global Solutions saves $3 million a year using OpManager

Industry: IT

Hinduja Global Solutions (HGS) is an Indian business process management (BPM) organization headquartered in Bangalore and part of the Hinduja Group. HGS combines technology-powered automation, analytics, and digital services focusing on back office proces

Learn more

OpManager

USA-Based Healthcare Organization Monitor's Network Devices Using OpManager and Network Configuration Manager

Industry: Healthcare

One of the largest radiology groups in the nation, with a team of more than 200 board-certified radiologists, provides more than 50 hospital and specialty clinic partners with on-site radiology coverage and interpretations.

Learn more

OpManager

Netherlands-based real estate data company avoids system downtime using OpManager and Firewall Analyzer

Industry: Real Estate

Vabi is a Netherlands-based company that provides "real estate data in order, for everyone." Since 1972, the company has focused on making software that calculates the performance of buildings. It has since then widened its scope from making calculations

Learn more

OpManager

Global news and media company

Industry: Telecommunication and Media

Bonita uses OpManager to monitor their network infrastructure and clear bottlenecks

Learn more

OpManager

Bonita

Industry: Businesses and Services

Bonita uses OpManager to monitor their network infrastructure and clear bottlenecks

Learn more

OpManager

Thorp Reed & Armstrong

Industry : Government

Randy S. Hollaway from Thorp Reed & Armstrong relies on OpManager for prompt alerts and reports

Learn more
 
 
 
 Pricing  Get Quote