Table of Contents

Triggers and problems
- Configuration
- Use trigger snippets

Triggers and problems

Configuration

Naming

Trigger names must be prefixed with the LLD object they belong to.

Trigger names should not use the {HOST.NAME} macro to keep names shorter. Consider getting this data from the host column.

Avoid using {ITEM.LASTVALUE} in trigger name

Don’t use {ITEM.LASTVALUE1-9} macros right in trigger names. These macros are expanded to values at the time when problem name is generated.

Use it in the operational data field (available since Zabbix 4.4) instead.

Explain the threshold in event name

Consider explaining why trigger fired (threshold) in parenthesis ().

Use the event name field for it (supported since Zabbix 5.2), to keep the trigger name short. The event name, if defined, will be used for generating the problem name.

E. g.:

Trigger name: CPU load is too high
Event name: CPU load is too high (over 1.5)

Other examples for event names:

Good	Bad
Temperature is too high (over 35 C for 5m) MySQL: Refused connections (max_connections limit reached)	Temperature is too high (now: 40) MySQL: Refused connections

Trigger description

Use this field to describe:

Describe the problem in more detail. But do not just copy the text from the trigger name.
Why it is important to check this
Describe the probable root cause of the problem if possible and which actions should be taken
Provide a reference to the documentation if any

Expressions

Trigger expressions should be reasonably flap-resistant - that is, not relying on the last value only but checking last 5 or 10 minutes instead. On the other hand, do not make the expressions overly complex - for example, do not use trigger hysteresis unless it really adds significant value.

Prefer to use user macros in trigger expressions to allow thresholds tuning.

Good	Bad
`last(/TEMPLATE_NAME/temperature)>{$TEMP.MAX.WARN}`	`last(/TEMPLATE_NAME/temperature)>30`

Use newlines and spaces to make long trigger expressions more human-readable.

Using time and data suffixes in triggers

Always use time (1m, 5m, 1d...) and size suffixes (1K, 1B, 1G) in trigger expressions and problem names, trigger description, operational data to improve readability. Remember, that you can use them in user macros, too.

Good	Bad
`avg(/TEMPLATE_NAME/temperature,10m)>{$TEMP.MAX.WARN}` `avg(/TEMPLATE_NAME/memory.free,10m)<{$MEM_FREE.WARN}` where {$MEM_FREE.WARN} = 100M	`avg(/TEMPLATE_NAME/temperature,600)>{$TEMP.MAX.WARN}}` `avg(/TEMPLATE_NAME/memory.free,600)<{$MEM_FREE.WARN}` where {$MEM_FREE.WARN} = 104857600

Severity

Triggers created in the templates are mapped to the standard Zabbix severity scale. Consider choosing the severity assigned to the trigger with the following in mind:

Severity	Description	Examples	Expected reaction type and time (not always true!), given as example only
Not classified	Not used under normal circumstances
Info	The event happened that is not an alarm at all. This is the info that might be helpful in the future for retrospective analysis or for auditing.	Examples: s/n changed, user logged in, etc	None
Warning	A minor alarm that could lead to some more serious problem if left without attention.	Examples: Disk usage is low but there is still some room	React during working hours, no notification is expected.
Average	Performance alarms: Average alarm that indicates serious performance problems or key service degradation. Fault alarms: partial resource failure or warnings that if left without attention might lead to complete device fault.	Examples: CPU utilization is high, Low memory, High device temperature, Disk health failure in the disk array, Website is slow.	React during working hours, create an issue ticket if the problem stays for hours.
High	Performance alarms: Key service is not available. Fault alarms: The device is not functioning or not available.	No ICMP PING, Website is down.	React off working hours if affects services with the page. React with a ticket during working hours otherwise.
Disaster	Reserved for alarms indicating blackouts, disasters, global business service faults. There should be no triggers with disaster level severity in resource templates.	Riga DC is down, Level core network is down, >50% of users cannot purchase anything from our website.	Always react by paging the responsible person.

Trigger tags

Use tags to logically group triggers using the recommended tagging model.

Trigger tags

Tag	Value	Description
scope	performance availability - a monitoring target or it's part may become unavailable capacity - a monitored resource may be exhausted notice security compliance - reserved for user-defined templates	Specifies the type of a problem. Including at least one tag is mandatory; multiple tags are allowed.

For example, the trigger High memory utilization might contain the following tags:

scope: capacity; scope: performance

Trigger macros

For macros used in trigger expressions (thresholds) use this form:

{$[<NAMESPACE>.]<METRIC_NAME>[.MAX|.MIN][.OK |.WARN|.CRIT]}

Use MAX|MIN when you need to highlight whether it is the high or low threshold.

Good	Bad
{$MYSQL.REPLICATION_LAG.MAX.WARN} {$TEMP.MAX.WARN:”{#SENSOR}”} {$SERVICE.STATUS.CRIT} {$IF.ERRORS.MAX.WARN} {$DISK.STATUS.OK} {$DISK.STATUS.WARN} {$DISK.STATUS.CRIT} {$MEM_UTIL.MAX.WARN} {$MEM_UTIL.MAX.CRIT}	{$DISK_OK_STATUS} {$MEMORY_UTIL_MAX}

Use trigger snippets

Check the following trigger snippets library and consider reusing configuration to avoid reinventing the wheel.

Case: Something has just been restarted

Trigger: <resource> has just been restarted (uptime < 10m)

Applicable for	For uptime counters for device, host, or software/service running
Name	<resource> has been restarted
Event name	<resource> has been restarted (uptime < 10m)
Description	<resource> uptime is less than 10 minutes
Expression	last(/TEMPLATE_NAME/METRIC)<10m
Recovery expression	-
Recovery mode	-
Manual close	Yes
Severity	Warning for the host. Info for all others.
Depends on	-

Case: Any master item + preprocessing in dependent items

Trigger: Master item is not responding

<resource>: Failed to get items (no data for 30m)

Applicable for	Any type of items used for bulk data collection
Expression	nodata(/TEMPLATE_NAME/temperature,30m)=1
Recovery expression	-
Recovery mode	-
Manual close	Yes
Severity	Warning
Depends on	If present: <Proc> is not running

Case: HTTP item + regex preprocessing in dependent items

Trigger: HTTP item is not responding

Applicable for	HTTP items that provide output for future regex preprocessing. Use ‘Headers and Body’ mode in the item.

Case: <VALUE> is too high (over X)/ is too low (under X) for slow to change values

For slow changing values (i.e. temperature, use max() for high, and min() for lows to get immediate response with delayed (confirmed) recovery.

Trigger: <VALUE> is too high (over X)

Applicable for	High temperature (slow to change)
Expression	max(/TEMPLATE_NAME/METRIC,5m) > X

Trigger: <VALUE> is too low (under X)

Applicable for	Low temperature (slow to change)
Expression	min(/TEMPLATE_NAME/METRIC,5m) < X

Case: <VALUE> is too high (over X for 5m)/ is too low (under X for 5m) for quick-to-change and jumpy values

For jumpy values, use min (for high) and max(for low) to make triggers more tolerable to spikes/noise.

Trigger: <VALUE> is too high (over X for 5m)

Applicable for	CPU utilization (jumpy), signal strength(jumpy), network utilization
Expression	min(/TEMPLATE_NAME/METRIC,5m) > X

Trigger: <VALUE> is too low (under X for 5m)

Applicable for	CPU utilization (jumpy), signal strength(jumpy), network utilization
Expression	max(/TEMPLATE_NAME/METRIC,5m) < X

Case: Serial number has changed on the device

Trigger: Serial numbers controls

Applicable for	Serial numbers items
Name	<resource> has been replaced
Event name	<resource> has been replaced (new serial number received)
Description	<resource> serial number has changed. Ack to close
Expression	last(/TEMPLATE_NAME/METRIC)<>last(/TEMPLATE_NAME/METRIC,#2) and length(/TEMPLATE_NAME/METRIC)>0
Recovery expression	-
Recovery mode	None
Manual close	Yes
Severity	Info
Depends on	-

Case: Software version has changed on the device

Trigger: Version controls

Applicable for	Software version items
Name	<resource> version has changed
Event name	<resource> version has changed (new version: {ITEM.VALUE})
Description	<resource> version has changed. Ack to close
Expression	last(/TEMPLATE_NAME/METRIC)<>last(/TEMPLATE_NAME/METRIC,#2) and length(/TEMPLATE_NAME/METRIC)>0
Recovery expression	-
Recovery mode	None
Manual close	Yes
Severity	Info
Depends on	-

Case: Control how much disk space is left

Trigger: Filesystem space is critically low with timeleft with context macro

{$VFS.FS.PUSED.MAX.CRIT:\"__RESOURCE__\"} = 90

Applicable for	Filesystems
Name	Disk space is critically low
Event name	Disk space is critically low (used > {$VFS.FS.PUSED.MAX.CRIT:\"__RESOURCE__\"})
Description	Space used: {ITEM.VALUE3} of {ITEM.VALUE2} ({ITEM.VALUE1}), time left till full: < 24h. Two conditions should match: First, space utilization should be above {$VFS.FS.PUSED.MAX.CRIT:\"__RESOURCE__\"}. Second condition should be one of the following: - The disk free space is less than 5G. - The disk will be full in less than 24 hours.
Expression	last(/TEMPLATE_NAME/vfs.fs.size[{#FSNAME},pused])>{$VFS.FS.PUSED.MAX.CRIT:"{#FSNAME}"} and (last(/TEMPLATE_NAME/vfs.fs.size[{#FSNAME},total])-last(/TEMPLATE_NAME/vfs.fs.size[{#FSNAME},used]))<{$VFS.FS.FREE.MIN.CRIT:"{#FSNAME}"} or timeleft((/TEMPLATE_NAME/vfs.fs.size[{#FSNAME},pused],1h,100)<1d
Recovery expression	-
Recovery mode	None
Manual close	Yes
Severity	Average
Depends on	-

Trigger: Filesystem space is low with timeleft with context macro

{$VFS.FS.PUSED.WARN.CRIT:\"__RESOURCE__\"} = 80

Applicable for	Filesystems
Name	Disk space is low
Event name	Disk space is low (used > {$VFS.FS.PUSED.MAX.WARN:\"__RESOURCE__\"})
Description	Space used: {ITEM.VALUE3} of {ITEM.VALUE2} ({ITEM.VALUE1}), time left till full: < 24h. Two conditions should match: First, space utilization should be above {$VFS.FS.PUSED.MAX.WARN:\"__RESOURCE__\"}. Second condition should be one of the following: - The disk free space is less than 10G. - The disk will be full in less than 24 hours.
Expression	last(/TEMPLATE_NAME/vfs.fs.size[{#FSNAME},pused])>{$VFS.FS.PUSED.MAX.WARN:"{#FSNAME}"} and (last(/TEMPLATE_NAME/vfs.fs.size[{#FSNAME},total])-last(/TEMPLATE_NAME/vfs.fs.size[{#FSNAME},used]))<{$VFS.FS.FREE.MIN.WARN:"{#FSNAME}"} or timeleft((/TEMPLATE_NAME/vfs.fs.size[{#FSNAME},pused],1h,100)<1d
Recovery expression	-
Recovery mode	None
Manual close	Yes
Severity	Warning
Depends on	Disk space is critically low.

Documentation