Trigger-based event correlation allows to correlate separate problems reported by one trigger.
While generally an OK event can close all problem events created by one trigger, there are cases when a more detailed approach is needed. For example, when monitoring log files you may want to discover certain problems in a log file and close them individually rather than all together.
This is the case with triggers that have PROBLEM event generation mode parameter set to Multiple. Such triggers are normally used for log monitoring, trap processing, etc.
It is possible in Zabbix to relate problem events based on tagging. Tags are used to extract values and create identification for problem events. Taking advantage of that, problems can also be closed individually based on matching tag.
In other words, the same trigger can create separate events identified by the event tag. Therefore problem events can be identified one-by-one and closed separately based on the identification by the event tag.
In log monitoring you may encounter lines similar to these:
Line1: Service 1 stopped
Line2: Service 2 stopped
Line3: Service 1 was restarted
Line4: Service 2 was restarted
The idea of event correlation is to be able to match the problem event from Line1 to the resolution from Line3 and the problem event from Line2 to the resolution from Line4, and close these problems one by one:
Line1: Service 1 stopped
Line3: Service 1 was restarted #problem from Line 1 closed
Line2: Service 2 stopped
Line4: Service 2 was restarted #problem from Line 2 closed
To do this you need to tag these related events as, for example, "Service 1" and "Service 2". That can be done by applying a regular expression to the log line to extract the tag value. Then, when events are created, they are tagged "Service 1" and "Service 2" respectively and problem can be matched to the resolution.
To begin with, you may want to set up an item that monitors a log file, for example:
With the item set up, wait a minute for the configuration changes to be picked up and then go to Latest data to make sure that the item has started collecting data.
With the item working you need to configure the trigger. It's important to decide what entries in the log file are worth paying attention to. For example, the following trigger expression will search for a string like 'Stopping' to signal potential problems:
To make sure that each line containing the string "Stopping" is considered a problem also set the Problem event generation mode in trigger configuration to 'Multiple'.
Then define a recovery expression. The following recovery expression will resolve all problems if a log line is found containing the string "Starting":
Since we do not want that it's important to make sure somehow that the corresponding root problems are closed, not just all problems. That's where tagging can help.
Problems and resolutions can be matched by specifying a tag in the trigger configuration. The following settings have to be made:
If configured successfully you will be able to see problem events tagged by application and matched to their resolution in Monitoring → Problems.
Because misconfiguration is possible, when similar event tags may be created for unrelated problems, please review the cases outlined below!