In this last blog post of this series I’ll provide a real example which explains how a server can be monitored efficiently with events monitoring.
Let us say that there are three main applications (a mail server, a database management system and a fax server) installed on one main server. These applications are very common in a productive environment and it is obvious that these applications must be responsive around the clock.
So the primary target of the prevention plan of a possible IT disaster will be to ensure that all main applications are fully available 24/7.
For this we need to find out what type of risks could harm the availability of these main applications. There are several risks such as low capacity of hard disk or memory availability.
For example low memory availability may cause a delay in processing tasks or even cause a failure of performing large operations. In general capacity restrictions are not logged by default in the event logs, so we should verify how extensive logging can be enabled within the main application. The logging procedures are different and depend mainly on the application itself. Sometimes the main application contains its own health system which you may take advantage of.
You may decide to create individual rules (trigger) within the main application where a specific type of event should be written in a log file if certain events occur during the normal operation time which later can be used for analysis purposes by events monitoring.
Since log files are (generally) file size restricted, the log files are overwritten periodically. Therefore we need to ensure that the events monitoring software collects all relevant log files in time and stores them safely in a centralised database.
Good events monitoring software allows you to create individual pre-processing rules depending on the event source and event type. Furthermore pre-processing rules ensures that only certain events will be collected and stored, and certain actions will be triggered in the occurrences of certain events.
E.g. if the IMAP service stops unexpectedly, then a notification will be sent automatically to the system administrator by the events monitoring software. If possible, the events monitoring software will try to restart the IMAP service and report the result of the action in a timely manner.
Automation allows system administrators to simplify routine duties such as restarting a service automatically when it stops, which, in most cases, faster than a manual interaction of a system administrator. Furthermore it saves time and reduces administration cost.
Not all events correspond to a possible system failure, so sometimes the judgement of the system administrator may be required. In these cases, dashboards and reports are very helpful as they summarize the main data into a visual chart making it convenient to read. Reports and dashboards are generally customizable, so it makes sense to build individual reports and create dashboards that apply to the individual viewers. Some events monitoring software allow scheduling reports so that a system administrator receives specific reports on daily, weekly or monthly basis.
Events monitoring contributes greatly to preventing IT disasters as the right events are monitored in the right time and actions based on the events are triggered in a timely fashion manner.
2 Comments