Available solutions

Template App Elasticsearch Cluster by HTTP
3rd party solutions

This template is for Zabbix version: 7.2

Also available for: 7.0 6.4 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/elasticsearch_http?at=release/7.2

Elasticsearch Cluster by HTTP

Overview

The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.

Requirements

Zabbix version: 7.2 and higher.

Tested versions

This template has been tested on:

Elasticsearch 6.5, 7.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Set the hostname or IP address of the Elasticsearch host in the {$ELASTICSEARCH.HOST} macro.
Set the login and password in the {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros.
If you use an atypical location of ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.

Macros used

Name	Description	Default
{$ELASTICSEARCH.USERNAME}	The username of the Elasticsearch.
{$ELASTICSEARCH.PASSWORD}	The password of the Elasticsearch.
{$ELASTICSEARCH.HOST}	The hostname or IP address of the Elasticsearch host.	`<SET ELASTICSEARCH HOST>`
{$ELASTICSEARCH.PORT}	The port of the Elasticsearch host.	`9200`
{$ELASTICSEARCH.SCHEME}	The scheme of the Elasticsearch (http/https).	`http`
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}	The ES cluster maximum response time in seconds for trigger expression.	`10s`
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}	Maximum of query latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}	Maximum of fetch latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}	Maximum of indexing latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}	Maximum of flush latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}	The maximum percent in the use of JVM heap for warning trigger expression.	`85`
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}	The maximum percent in the use of JVM heap for critically trigger expression.	`95`

Items

Name	Description	Type	Key and additional info
Service status	Checks if the service is running and accepting TCP connections.	Simple check	net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time	Checks performance of the TCP service.	Simple check	net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"]
Get cluster health	Returns the health status of a cluster.	HTTP agent	es.cluster.get_health
Cluster health status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	Dependent item	es.cluster.status Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Number of nodes	The number of nodes within the cluster.	Dependent item	es.cluster.number_of_nodes Preprocessing JSON Path: `$.number_of_nodes` Discard unchanged with heartbeat: `1h`
Number of data nodes	The number of nodes that are dedicated to data nodes.	Dependent item	es.cluster.number_of_data_nodes Preprocessing JSON Path: `$.number_of_data_nodes` Discard unchanged with heartbeat: `1h`
Number of relocating shards	The number of shards that are under relocation.	Dependent item	es.cluster.relocating_shards Preprocessing JSON Path: `$.relocating_shards`
Number of initializing shards	The number of shards that are under initialization.	Dependent item	es.cluster.initializing_shards Preprocessing JSON Path: `$.initializing_shards`
Number of unassigned shards	The number of shards that are not allocated.	Dependent item	es.cluster.unassigned_shards Preprocessing JSON Path: `$.unassigned_shards`
Delayed unassigned shards	The number of shards whose allocation has been delayed by the timeout settings.	Dependent item	es.cluster.delayed_unassigned_shards Preprocessing JSON Path: `$.delayed_unassigned_shards`
Number of pending tasks	The number of cluster-level changes that have not yet been executed.	Dependent item	es.cluster.number_of_pending_tasks Preprocessing JSON Path: `$.number_of_pending_tasks`
Task max waiting in queue	The time expressed in seconds since the earliest initiated task is waiting for being performed.	Dependent item	es.cluster.task_max_waiting_in_queue Preprocessing JSON Path: `$.task_max_waiting_in_queue_millis` Custom multiplier: `0.001`
Inactive shards percentage	The ratio of inactive shards in the cluster expressed as a percentage.	Dependent item	es.cluster.inactive_shards_percent_as_number Preprocessing JSON Path: `$.active_shards_percent_as_number` JavaScript: `The text is too long. Please see the template.`
Get cluster stats	Returns cluster statistics.	HTTP agent	es.cluster.get_stats
Cluster uptime	Uptime duration in seconds since JVM has last started.	Dependent item	es.nodes.jvm.max_uptime Preprocessing JSON Path: `$.nodes.jvm.max_uptime_in_millis` Custom multiplier: `0.001`
Number of non-deleted documents	The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields.	Dependent item	es.indices.docs.count Preprocessing JSON Path: `$.indices.docs.count` Discard unchanged with heartbeat: `1h`
Indices with shards assigned to nodes	The total number of indices with shards assigned to the selected nodes.	Dependent item	es.indices.count Preprocessing JSON Path: `$.indices.count` Discard unchanged with heartbeat: `1h`
Total size of all file stores	The total size in bytes of all file stores across all selected nodes.	Dependent item	es.nodes.fs.total_in_bytes Preprocessing JSON Path: `$.nodes.fs.total_in_bytes` Discard unchanged with heartbeat: `1h`
Total available size to JVM in all file stores	The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use.	Dependent item	es.nodes.fs.available_in_bytes Preprocessing JSON Path: `$.nodes.fs.available_in_bytes` Discard unchanged with heartbeat: `1h`
Nodes with the data role	The number of selected nodes with the data role.	Dependent item	es.nodes.count.data Preprocessing JSON Path: `$.nodes.count.data` Discard unchanged with heartbeat: `1h`
Nodes with the ingest role	The number of selected nodes with the ingest role.	Dependent item	es.nodes.count.ingest Preprocessing JSON Path: `$.nodes.count.ingest` Discard unchanged with heartbeat: `1h`
Nodes with the master role	The number of selected nodes with the master role.	Dependent item	es.nodes.count.master Preprocessing JSON Path: `$.nodes.count.master` Discard unchanged with heartbeat: `1h`
Get nodes stats	Returns cluster nodes statistics.	HTTP agent	es.nodes.get_stats

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Elasticsearch: Service is down	The service is unavailable or does not accept TCP connections.	`last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0`	Average	Manual close: Yes
Elasticsearch: Service response time is too high	The performance of the TCP service is very low.	`min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}`	Warning	Manual close: Yes Depends on: Elasticsearch: Service is down
Elasticsearch: Health is YELLOW	All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1`	Average
Elasticsearch: Health is RED	One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2`	High
Elasticsearch: Health is UNKNOWN	The health status of the cluster is unknown or cannot be obtained.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255`	High
Elasticsearch: The number of nodes within the cluster has decreased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0`	Info	Manual close: Yes
Elasticsearch: The number of nodes within the cluster has increased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0`	Info	Manual close: Yes
Elasticsearch: Cluster has the initializing shards	The cluster has the initializing shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0`	Average
Elasticsearch: Cluster has the unassigned shards	The cluster has the unassigned shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0`	Average
Elasticsearch: Cluster has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m`	Info	Manual close: Yes
Elasticsearch: Cluster does not have enough space for resharding	There is not enough disk space for index resharding.	`(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes)`	High
Elasticsearch: Cluster has only two master nodes	The cluster has only two nodes with a master role and will be unavailable if one of them breaks.	`last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2`	Disaster

LLD rule Cluster nodes discovery

Name Description Type Key and additional info

Cluster nodes discovery

Name	Description	Type	Key and additional info
Cluster nodes discovery	Discovery ES cluster nodes.	HTTP agent	es.nodes.discovery Preprocessing JSON Path: `$.nodes.[*]` Discard unchanged with heartbeat: `1d`

Discovery ES cluster nodes.

HTTP agent

es.nodes.discovery

Preprocessing

JSON Path: $.nodes.[*]
Discard unchanged with heartbeat: 1d

Item prototypes for Cluster nodes discovery

Name	Description	Type	Key and additional info
ES {#ES.NODE}: Get data	Returns cluster nodes statistics.	Dependent item	es.node.get.data[{#ES.NODE}] Preprocessing JSON Path: `$..[?(@.name=='{#ES.NODE}')].first()`
ES {#ES.NODE}: Total size	Total size (in bytes) of all file stores.	Dependent item	es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.total_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Total available size	The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize.	Dependent item	es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.available_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Node uptime	JVM uptime in seconds.	Dependent item	es.node.jvm.uptime[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.uptime_in_millis.first()` Custom multiplier: `0.001`
ES {#ES.NODE}: Maximum JVM memory available for use	The maximum amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_max_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Amount of JVM heap currently in use	The memory, in bytes, currently in use by the heap.	Dependent item	es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Percent of JVM heap currently in use	The percentage of memory currently in use by the heap.	Dependent item	es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_percent.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Amount of JVM heap committed	The amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_committed_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Number of open HTTP connections	The number of currently open HTTP connections for the node.	Dependent item	es.node.http.current_open[{#ES.NODE}] Preprocessing JSON Path: `$..http.current_open.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Rate of HTTP connections opened	The number of HTTP connections opened for the node per second.	Dependent item	es.node.http.opened.rate[{#ES.NODE}] Preprocessing JSON Path: `$..http.total_opened.first()` Change per second
ES {#ES.NODE}: Time spent throttling operations	Time in seconds spent throttling operations for the last measuring span.	Dependent item	es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling recovery operations	Time in seconds spent throttling recovery operations for the last measuring span.	Dependent item	es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.recovery.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling merge operations	Time in seconds spent throttling merge operations for the last measuring span.	Dependent item	es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.merges.total_throttled_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Rate of queries	The number of query operations per second.	Dependent item	es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Change per second
ES {#ES.NODE}: Total number of query	The total number of query operations.	Dependent item	es.node.indices.search.query_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing query	Time in seconds spent performing query operations for the last measuring span.	Dependent item	es.node.indices.search.query_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing query	Time in milliseconds spent performing query operations.	Dependent item	es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Query latency	The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.query_latency[{#ES.NODE}]
ES {#ES.NODE}: Current query operations	The number of query operations currently running.	Dependent item	es.node.indices.search.query_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_current.first()`
ES {#ES.NODE}: Rate of fetch	The number of fetch operations per second.	Dependent item	es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Change per second
ES {#ES.NODE}: Total number of fetch	The total number of fetch operations.	Dependent item	es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing fetch	Time in seconds spent performing fetch operations for the last measuring span.	Dependent item	es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing fetch	Time in milliseconds spent performing fetch operations.	Dependent item	es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Fetch latency	The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.fetch_latency[{#ES.NODE}]
ES {#ES.NODE}: Current fetch operations	The number of fetch operations currently running.	Dependent item	es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_current.first()`
ES {#ES.NODE}: Write thread pool executor tasks completed	The number of tasks completed by the write thread pool executor.	Dependent item	es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.completed.first()` Change per second
ES {#ES.NODE}: Write thread pool active threads	The number of active threads in the write thread pool.	Dependent item	es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.active.first()`
ES {#ES.NODE}: Write thread pool tasks in queue	The number of tasks in queue for the write thread pool.	Dependent item	es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.queue.first()`
ES {#ES.NODE}: Write thread pool executor tasks rejected	The number of tasks rejected by the write thread pool executor.	Dependent item	es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.rejected.first()` Change per second
ES {#ES.NODE}: Search thread pool executor tasks completed	The number of tasks completed by the search thread pool executor.	Dependent item	es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.completed.first()` Change per second
ES {#ES.NODE}: Search thread pool active threads	The number of active threads in the search thread pool.	Dependent item	es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.active.first()`
ES {#ES.NODE}: Search thread pool tasks in queue	The number of tasks in queue for the search thread pool.	Dependent item	es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.queue.first()`
ES {#ES.NODE}: Search thread pool executor tasks rejected	The number of tasks rejected by the search thread pool executor.	Dependent item	es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.rejected.first()` Change per second
ES {#ES.NODE}: Refresh thread pool executor tasks completed	The number of tasks completed by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.completed.first()` Change per second
ES {#ES.NODE}: Refresh thread pool active threads	The number of active threads in the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.active.first()`
ES {#ES.NODE}: Refresh thread pool tasks in queue	The number of tasks in queue for the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.queue.first()`
ES {#ES.NODE}: Refresh thread pool executor tasks rejected	The number of tasks rejected by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.rejected.first()` Change per second
ES {#ES.NODE}: Total number of indexing	The total number of indexing operations.	Dependent item	es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent performing indexing	Total time in milliseconds spent performing indexing operations.	Dependent item	es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Indexing latency	The average indexing latency calculated from the available index_total and index_time_in_millis metrics.	Calculated	es.node.indices.indexing.index_latency[{#ES.NODE}]
ES {#ES.NODE}: Current indexing operations	The number of indexing operations currently running.	Dependent item	es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_current.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total number of index flushes to disk	The total number of flush operations.	Dependent item	es.node.indices.flush.total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent on flushing indices to disk	Total time in milliseconds spent performing flush operations.	Dependent item	es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Flush latency	The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.	Calculated	es.node.indices.flush.latency[{#ES.NODE}]
ES {#ES.NODE}: Rate of index refreshes	The number of refresh operations per second.	Dependent item	es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total.first()` Change per second
ES {#ES.NODE}: Time spent performing refresh	Time in seconds spent performing refresh operations for the last measuring span.	Dependent item	es.node.indices.refresh.time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total_time_in_millis.first()` Custom multiplier: `0.001` Simple change

Trigger prototypes for Cluster nodes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Elasticsearch: ES {#ES.NODE}: has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m`	Info	Manual close: Yes
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is high	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}`	Warning	Depends on: Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is critical
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is critical	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}`	High
Elasticsearch: ES {#ES.NODE}: Query latency is too high	If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}`	Warning
Elasticsearch: ES {#ES.NODE}: Fetch latency is too high	The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}`	Warning
Elasticsearch: ES {#ES.NODE}: Write thread pool executor has the rejected tasks	The number of tasks rejected by the write thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0`	Warning
Elasticsearch: ES {#ES.NODE}: Search thread pool executor has the rejected tasks	The number of tasks rejected by the search thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0`	Warning
Elasticsearch: ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks	The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0`	Warning
Elasticsearch: ES {#ES.NODE}: Indexing latency is too high	If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).	`min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}`	Warning
Elasticsearch: ES {#ES.NODE}: Flush latency is too high	If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}`	Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 7.0

Also available for: 7.2 6.4 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/elasticsearch_http?at=release/7.0

Elasticsearch Cluster by HTTP

Overview

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

Elasticsearch 6.5, 7.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Set the hostname or IP address of the Elasticsearch host in the {$ELASTICSEARCH.HOST} macro.
Set the login and password in the {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros.
If you use an atypical location of ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.

Macros used

Name	Description	Default
{$ELASTICSEARCH.USERNAME}	The username of the Elasticsearch.
{$ELASTICSEARCH.PASSWORD}	The password of the Elasticsearch.
{$ELASTICSEARCH.HOST}	The hostname or IP address of the Elasticsearch host.	`<SET ELASTICSEARCH HOST>`
{$ELASTICSEARCH.PORT}	The port of the Elasticsearch host.	`9200`
{$ELASTICSEARCH.SCHEME}	The scheme of the Elasticsearch (http/https).	`http`
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}	The ES cluster maximum response time in seconds for trigger expression.	`10s`
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}	Maximum of query latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}	Maximum of fetch latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}	Maximum of indexing latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}	Maximum of flush latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}	The maximum percent in the use of JVM heap for warning trigger expression.	`85`
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}	The maximum percent in the use of JVM heap for critically trigger expression.	`95`

Items

Name	Description	Type	Key and additional info
Service status	Checks if the service is running and accepting TCP connections.	Simple check	net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
Service response time	Checks performance of the TCP service.	Simple check	net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"]
Get cluster health	Returns the health status of a cluster.	HTTP agent	es.cluster.get_health
Cluster health status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	Dependent item	es.cluster.status Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
Number of nodes	The number of nodes within the cluster.	Dependent item	es.cluster.number_of_nodes Preprocessing JSON Path: `$.number_of_nodes` Discard unchanged with heartbeat: `1h`
Number of data nodes	The number of nodes that are dedicated to data nodes.	Dependent item	es.cluster.number_of_data_nodes Preprocessing JSON Path: `$.number_of_data_nodes` Discard unchanged with heartbeat: `1h`
Number of relocating shards	The number of shards that are under relocation.	Dependent item	es.cluster.relocating_shards Preprocessing JSON Path: `$.relocating_shards`
Number of initializing shards	The number of shards that are under initialization.	Dependent item	es.cluster.initializing_shards Preprocessing JSON Path: `$.initializing_shards`
Number of unassigned shards	The number of shards that are not allocated.	Dependent item	es.cluster.unassigned_shards Preprocessing JSON Path: `$.unassigned_shards`
Delayed unassigned shards	The number of shards whose allocation has been delayed by the timeout settings.	Dependent item	es.cluster.delayed_unassigned_shards Preprocessing JSON Path: `$.delayed_unassigned_shards`
Number of pending tasks	The number of cluster-level changes that have not yet been executed.	Dependent item	es.cluster.number_of_pending_tasks Preprocessing JSON Path: `$.number_of_pending_tasks`
Task max waiting in queue	The time expressed in seconds since the earliest initiated task is waiting for being performed.	Dependent item	es.cluster.task_max_waiting_in_queue Preprocessing JSON Path: `$.task_max_waiting_in_queue_millis` Custom multiplier: `0.001`
Inactive shards percentage	The ratio of inactive shards in the cluster expressed as a percentage.	Dependent item	es.cluster.inactive_shards_percent_as_number Preprocessing JSON Path: `$.active_shards_percent_as_number` JavaScript: `The text is too long. Please see the template.`
Get cluster stats	Returns cluster statistics.	HTTP agent	es.cluster.get_stats
Cluster uptime	Uptime duration in seconds since JVM has last started.	Dependent item	es.nodes.jvm.max_uptime Preprocessing JSON Path: `$.nodes.jvm.max_uptime_in_millis` Custom multiplier: `0.001`
Number of non-deleted documents	The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields.	Dependent item	es.indices.docs.count Preprocessing JSON Path: `$.indices.docs.count` Discard unchanged with heartbeat: `1h`
Indices with shards assigned to nodes	The total number of indices with shards assigned to the selected nodes.	Dependent item	es.indices.count Preprocessing JSON Path: `$.indices.count` Discard unchanged with heartbeat: `1h`
Total size of all file stores	The total size in bytes of all file stores across all selected nodes.	Dependent item	es.nodes.fs.total_in_bytes Preprocessing JSON Path: `$.nodes.fs.total_in_bytes` Discard unchanged with heartbeat: `1h`
Total available size to JVM in all file stores	The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use.	Dependent item	es.nodes.fs.available_in_bytes Preprocessing JSON Path: `$.nodes.fs.available_in_bytes` Discard unchanged with heartbeat: `1h`
Nodes with the data role	The number of selected nodes with the data role.	Dependent item	es.nodes.count.data Preprocessing JSON Path: `$.nodes.count.data` Discard unchanged with heartbeat: `1h`
Nodes with the ingest role	The number of selected nodes with the ingest role.	Dependent item	es.nodes.count.ingest Preprocessing JSON Path: `$.nodes.count.ingest` Discard unchanged with heartbeat: `1h`
Nodes with the master role	The number of selected nodes with the master role.	Dependent item	es.nodes.count.master Preprocessing JSON Path: `$.nodes.count.master` Discard unchanged with heartbeat: `1h`
Get nodes stats	Returns cluster nodes statistics.	HTTP agent	es.nodes.get_stats

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
Elasticsearch: Service is down	The service is unavailable or does not accept TCP connections.	`last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0`	Average	Manual close: Yes
Elasticsearch: Service response time is too high	The performance of the TCP service is very low.	`min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}`	Warning	Manual close: Yes Depends on: Elasticsearch: Service is down
Elasticsearch: Health is YELLOW	All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1`	Average
Elasticsearch: Health is RED	One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2`	High
Elasticsearch: Health is UNKNOWN	The health status of the cluster is unknown or cannot be obtained.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255`	High
Elasticsearch: The number of nodes within the cluster has decreased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0`	Info	Manual close: Yes
Elasticsearch: The number of nodes within the cluster has increased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0`	Info	Manual close: Yes
Elasticsearch: Cluster has the initializing shards	The cluster has the initializing shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0`	Average
Elasticsearch: Cluster has the unassigned shards	The cluster has the unassigned shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0`	Average
Elasticsearch: Cluster has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m`	Info	Manual close: Yes
Elasticsearch: Cluster does not have enough space for resharding	There is not enough disk space for index resharding.	`(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes)`	High
Elasticsearch: Cluster has only two master nodes	The cluster has only two nodes with a master role and will be unavailable if one of them breaks.	`last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2`	Disaster

LLD rule Cluster nodes discovery

Name Description Type Key and additional info

Cluster nodes discovery

Name	Description	Type	Key and additional info
Cluster nodes discovery	Discovery ES cluster nodes.	HTTP agent	es.nodes.discovery Preprocessing JSON Path: `$.nodes.[*]` Discard unchanged with heartbeat: `1d`

Discovery ES cluster nodes.

HTTP agent

es.nodes.discovery

Preprocessing

JSON Path: $.nodes.[*]
Discard unchanged with heartbeat: 1d

Item prototypes for Cluster nodes discovery

Name	Description	Type	Key and additional info
ES {#ES.NODE}: Get data	Returns cluster nodes statistics.	Dependent item	es.node.get.data[{#ES.NODE}] Preprocessing JSON Path: `$..[?(@.name=='{#ES.NODE}')].first()`
ES {#ES.NODE}: Total size	Total size (in bytes) of all file stores.	Dependent item	es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.total_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Total available size	The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize.	Dependent item	es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.available_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Node uptime	JVM uptime in seconds.	Dependent item	es.node.jvm.uptime[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.uptime_in_millis.first()` Custom multiplier: `0.001`
ES {#ES.NODE}: Maximum JVM memory available for use	The maximum amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_max_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Amount of JVM heap currently in use	The memory, in bytes, currently in use by the heap.	Dependent item	es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Percent of JVM heap currently in use	The percentage of memory currently in use by the heap.	Dependent item	es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_percent.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Amount of JVM heap committed	The amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_committed_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Number of open HTTP connections	The number of currently open HTTP connections for the node.	Dependent item	es.node.http.current_open[{#ES.NODE}] Preprocessing JSON Path: `$..http.current_open.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Rate of HTTP connections opened	The number of HTTP connections opened for the node per second.	Dependent item	es.node.http.opened.rate[{#ES.NODE}] Preprocessing JSON Path: `$..http.total_opened.first()` Change per second
ES {#ES.NODE}: Time spent throttling operations	Time in seconds spent throttling operations for the last measuring span.	Dependent item	es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling recovery operations	Time in seconds spent throttling recovery operations for the last measuring span.	Dependent item	es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.recovery.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling merge operations	Time in seconds spent throttling merge operations for the last measuring span.	Dependent item	es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.merges.total_throttled_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Rate of queries	The number of query operations per second.	Dependent item	es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Change per second
ES {#ES.NODE}: Total number of query	The total number of query operations.	Dependent item	es.node.indices.search.query_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing query	Time in seconds spent performing query operations for the last measuring span.	Dependent item	es.node.indices.search.query_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing query	Time in milliseconds spent performing query operations.	Dependent item	es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Query latency	The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.query_latency[{#ES.NODE}]
ES {#ES.NODE}: Current query operations	The number of query operations currently running.	Dependent item	es.node.indices.search.query_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_current.first()`
ES {#ES.NODE}: Rate of fetch	The number of fetch operations per second.	Dependent item	es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Change per second
ES {#ES.NODE}: Total number of fetch	The total number of fetch operations.	Dependent item	es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing fetch	Time in seconds spent performing fetch operations for the last measuring span.	Dependent item	es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing fetch	Time in milliseconds spent performing fetch operations.	Dependent item	es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Fetch latency	The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.fetch_latency[{#ES.NODE}]
ES {#ES.NODE}: Current fetch operations	The number of fetch operations currently running.	Dependent item	es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_current.first()`
ES {#ES.NODE}: Write thread pool executor tasks completed	The number of tasks completed by the write thread pool executor.	Dependent item	es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.completed.first()` Change per second
ES {#ES.NODE}: Write thread pool active threads	The number of active threads in the write thread pool.	Dependent item	es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.active.first()`
ES {#ES.NODE}: Write thread pool tasks in queue	The number of tasks in queue for the write thread pool.	Dependent item	es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.queue.first()`
ES {#ES.NODE}: Write thread pool executor tasks rejected	The number of tasks rejected by the write thread pool executor.	Dependent item	es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.rejected.first()` Change per second
ES {#ES.NODE}: Search thread pool executor tasks completed	The number of tasks completed by the search thread pool executor.	Dependent item	es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.completed.first()` Change per second
ES {#ES.NODE}: Search thread pool active threads	The number of active threads in the search thread pool.	Dependent item	es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.active.first()`
ES {#ES.NODE}: Search thread pool tasks in queue	The number of tasks in queue for the search thread pool.	Dependent item	es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.queue.first()`
ES {#ES.NODE}: Search thread pool executor tasks rejected	The number of tasks rejected by the search thread pool executor.	Dependent item	es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.rejected.first()` Change per second
ES {#ES.NODE}: Refresh thread pool executor tasks completed	The number of tasks completed by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.completed.first()` Change per second
ES {#ES.NODE}: Refresh thread pool active threads	The number of active threads in the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.active.first()`
ES {#ES.NODE}: Refresh thread pool tasks in queue	The number of tasks in queue for the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.queue.first()`
ES {#ES.NODE}: Refresh thread pool executor tasks rejected	The number of tasks rejected by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.rejected.first()` Change per second
ES {#ES.NODE}: Total number of indexing	The total number of indexing operations.	Dependent item	es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent performing indexing	Total time in milliseconds spent performing indexing operations.	Dependent item	es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Indexing latency	The average indexing latency calculated from the available index_total and index_time_in_millis metrics.	Calculated	es.node.indices.indexing.index_latency[{#ES.NODE}]
ES {#ES.NODE}: Current indexing operations	The number of indexing operations currently running.	Dependent item	es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_current.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total number of index flushes to disk	The total number of flush operations.	Dependent item	es.node.indices.flush.total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent on flushing indices to disk	Total time in milliseconds spent performing flush operations.	Dependent item	es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Flush latency	The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.	Calculated	es.node.indices.flush.latency[{#ES.NODE}]
ES {#ES.NODE}: Rate of index refreshes	The number of refresh operations per second.	Dependent item	es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total.first()` Change per second
ES {#ES.NODE}: Time spent performing refresh	Time in seconds spent performing refresh operations for the last measuring span.	Dependent item	es.node.indices.refresh.time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total_time_in_millis.first()` Custom multiplier: `0.001` Simple change

Trigger prototypes for Cluster nodes discovery

Name	Description	Expression	Severity	Dependencies and additional info
Elasticsearch: ES {#ES.NODE}: has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m`	Info	Manual close: Yes
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is high	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}`	Warning	Depends on: Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is critical
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is critical	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}`	High
Elasticsearch: ES {#ES.NODE}: Query latency is too high	If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}`	Warning
Elasticsearch: ES {#ES.NODE}: Fetch latency is too high	The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}`	Warning
Elasticsearch: ES {#ES.NODE}: Write thread pool executor has the rejected tasks	The number of tasks rejected by the write thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0`	Warning
Elasticsearch: ES {#ES.NODE}: Search thread pool executor has the rejected tasks	The number of tasks rejected by the search thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0`	Warning
Elasticsearch: ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks	The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0`	Warning
Elasticsearch: ES {#ES.NODE}: Indexing latency is too high	If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).	`min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}`	Warning
Elasticsearch: ES {#ES.NODE}: Flush latency is too high	If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}`	Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.4

Also available for: 7.2 7.0 6.2 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/elasticsearch_http?at=release/6.4

Elasticsearch Cluster by HTTP

Overview

Requirements

Zabbix version: 6.4 and higher.

Tested versions

This template has been tested on:

Elasticsearch 6.5, 7.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Set the hostname or IP address of the Elasticsearch host in the {$ELASTICSEARCH.HOST} macro.
Set the login and password in the {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros.
If you use an atypical location of ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.

Macros used

Name	Description	Default
{$ELASTICSEARCH.USERNAME}	The username of the Elasticsearch.
{$ELASTICSEARCH.PASSWORD}	The password of the Elasticsearch.
{$ELASTICSEARCH.HOST}	The hostname or IP address of the Elasticsearch host.	`<SET ELASTICSEARCH HOST>`
{$ELASTICSEARCH.PORT}	The port of the Elasticsearch host.	`9200`
{$ELASTICSEARCH.SCHEME}	The scheme of the Elasticsearch (http/https).	`http`
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}	The ES cluster maximum response time in seconds for trigger expression.	`10s`
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}	Maximum of query latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}	Maximum of fetch latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}	Maximum of indexing latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}	Maximum of flush latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}	The maximum percent in the use of JVM heap for warning trigger expression.	`85`
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}	The maximum percent in the use of JVM heap for critically trigger expression.	`95`

Items

Name	Description	Type	Key and additional info
ES: Service status	Checks if the service is running and accepting TCP connections.	Simple check	net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
ES: Service response time	Checks performance of the TCP service.	Simple check	net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"]
ES: Get cluster health	Returns the health status of a cluster.	HTTP agent	es.cluster.get_health
ES: Cluster health status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	Dependent item	es.cluster.status Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
ES: Number of nodes	The number of nodes within the cluster.	Dependent item	es.cluster.number_of_nodes Preprocessing JSON Path: `$.number_of_nodes` Discard unchanged with heartbeat: `1h`
ES: Number of data nodes	The number of nodes that are dedicated to data nodes.	Dependent item	es.cluster.number_of_data_nodes Preprocessing JSON Path: `$.number_of_data_nodes` Discard unchanged with heartbeat: `1h`
ES: Number of relocating shards	The number of shards that are under relocation.	Dependent item	es.cluster.relocating_shards Preprocessing JSON Path: `$.relocating_shards`
ES: Number of initializing shards	The number of shards that are under initialization.	Dependent item	es.cluster.initializing_shards Preprocessing JSON Path: `$.initializing_shards`
ES: Number of unassigned shards	The number of shards that are not allocated.	Dependent item	es.cluster.unassigned_shards Preprocessing JSON Path: `$.unassigned_shards`
ES: Delayed unassigned shards	The number of shards whose allocation has been delayed by the timeout settings.	Dependent item	es.cluster.delayed_unassigned_shards Preprocessing JSON Path: `$.delayed_unassigned_shards`
ES: Number of pending tasks	The number of cluster-level changes that have not yet been executed.	Dependent item	es.cluster.number_of_pending_tasks Preprocessing JSON Path: `$.number_of_pending_tasks`
ES: Task max waiting in queue	The time expressed in seconds since the earliest initiated task is waiting for being performed.	Dependent item	es.cluster.task_max_waiting_in_queue Preprocessing JSON Path: `$.task_max_waiting_in_queue_millis` Custom multiplier: `0.001`
ES: Inactive shards percentage	The ratio of inactive shards in the cluster expressed as a percentage.	Dependent item	es.cluster.inactive_shards_percent_as_number Preprocessing JSON Path: `$.active_shards_percent_as_number` JavaScript: `The text is too long. Please see the template.`
ES: Get cluster stats	Returns cluster statistics.	HTTP agent	es.cluster.get_stats
ES: Cluster uptime	Uptime duration in seconds since JVM has last started.	Dependent item	es.nodes.jvm.max_uptime Preprocessing JSON Path: `$.nodes.jvm.max_uptime_in_millis` Custom multiplier: `0.001`
ES: Number of non-deleted documents	The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields.	Dependent item	es.indices.docs.count Preprocessing JSON Path: `$.indices.docs.count` Discard unchanged with heartbeat: `1h`
ES: Indices with shards assigned to nodes	The total number of indices with shards assigned to the selected nodes.	Dependent item	es.indices.count Preprocessing JSON Path: `$.indices.count` Discard unchanged with heartbeat: `1h`
ES: Total size of all file stores	The total size in bytes of all file stores across all selected nodes.	Dependent item	es.nodes.fs.total_in_bytes Preprocessing JSON Path: `$.nodes.fs.total_in_bytes` Discard unchanged with heartbeat: `1h`
ES: Total available size to JVM in all file stores	The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use.	Dependent item	es.nodes.fs.available_in_bytes Preprocessing JSON Path: `$.nodes.fs.available_in_bytes` Discard unchanged with heartbeat: `1h`
ES: Nodes with the data role	The number of selected nodes with the data role.	Dependent item	es.nodes.count.data Preprocessing JSON Path: `$.nodes.count.data` Discard unchanged with heartbeat: `1h`
ES: Nodes with the ingest role	The number of selected nodes with the ingest role.	Dependent item	es.nodes.count.ingest Preprocessing JSON Path: `$.nodes.count.ingest` Discard unchanged with heartbeat: `1h`
ES: Nodes with the master role	The number of selected nodes with the master role.	Dependent item	es.nodes.count.master Preprocessing JSON Path: `$.nodes.count.master` Discard unchanged with heartbeat: `1h`
ES: Get nodes stats	Returns cluster nodes statistics.	HTTP agent	es.nodes.get_stats

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
ES: Service is down	The service is unavailable or does not accept TCP connections.	`last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0`	Average	Manual close: Yes
ES: Service response time is too high	The performance of the TCP service is very low.	`min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}`	Warning	Manual close: Yes Depends on: ES: Service is down
ES: Health is YELLOW	All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1`	Average
ES: Health is RED	One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2`	High
ES: Health is UNKNOWN	The health status of the cluster is unknown or cannot be obtained.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255`	High
ES: The number of nodes within the cluster has decreased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0`	Info	Manual close: Yes
ES: The number of nodes within the cluster has increased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0`	Info	Manual close: Yes
ES: Cluster has the initializing shards	The cluster has the initializing shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0`	Average
ES: Cluster has the unassigned shards	The cluster has the unassigned shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0`	Average
ES: Cluster has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m`	Info	Manual close: Yes
ES: Cluster does not have enough space for resharding	There is not enough disk space for index resharding.	`(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes)`	High
ES: Cluster has only two master nodes	The cluster has only two nodes with a master role and will be unavailable if one of them breaks.	`last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2`	Disaster

LLD rule Cluster nodes discovery

Name Description Type Key and additional info

Cluster nodes discovery

Name	Description	Type	Key and additional info
Cluster nodes discovery	Discovery ES cluster nodes.	HTTP agent	es.nodes.discovery Preprocessing JSON Path: `$.nodes.[*]` Discard unchanged with heartbeat: `1d`

Discovery ES cluster nodes.

HTTP agent

es.nodes.discovery

Preprocessing

JSON Path: $.nodes.[*]
Discard unchanged with heartbeat: 1d

Item prototypes for Cluster nodes discovery

Name	Description	Type	Key and additional info
ES {#ES.NODE}: Get data	Returns cluster nodes statistics.	Dependent item	es.node.get.data[{#ES.NODE}] Preprocessing JSON Path: `$..[?(@.name=='{#ES.NODE}')].first()`
ES {#ES.NODE}: Total size	Total size (in bytes) of all file stores.	Dependent item	es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.total_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Total available size	The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize.	Dependent item	es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.available_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Node uptime	JVM uptime in seconds.	Dependent item	es.node.jvm.uptime[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.uptime_in_millis.first()` Custom multiplier: `0.001`
ES {#ES.NODE}: Maximum JVM memory available for use	The maximum amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_max_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Amount of JVM heap currently in use	The memory, in bytes, currently in use by the heap.	Dependent item	es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Percent of JVM heap currently in use	The percentage of memory currently in use by the heap.	Dependent item	es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_percent.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Amount of JVM heap committed	The amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_committed_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Number of open HTTP connections	The number of currently open HTTP connections for the node.	Dependent item	es.node.http.current_open[{#ES.NODE}] Preprocessing JSON Path: `$..http.current_open.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Rate of HTTP connections opened	The number of HTTP connections opened for the node per second.	Dependent item	es.node.http.opened.rate[{#ES.NODE}] Preprocessing JSON Path: `$..http.total_opened.first()` Change per second
ES {#ES.NODE}: Time spent throttling operations	Time in seconds spent throttling operations for the last measuring span.	Dependent item	es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling recovery operations	Time in seconds spent throttling recovery operations for the last measuring span.	Dependent item	es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.recovery.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling merge operations	Time in seconds spent throttling merge operations for the last measuring span.	Dependent item	es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.merges.total_throttled_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Rate of queries	The number of query operations per second.	Dependent item	es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Change per second
ES {#ES.NODE}: Total number of query	The total number of query operations.	Dependent item	es.node.indices.search.query_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing query	Time in seconds spent performing query operations for the last measuring span.	Dependent item	es.node.indices.search.query_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing query	Time in milliseconds spent performing query operations.	Dependent item	es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Query latency	The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.query_latency[{#ES.NODE}]
ES {#ES.NODE}: Current query operations	The number of query operations currently running.	Dependent item	es.node.indices.search.query_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_current.first()`
ES {#ES.NODE}: Rate of fetch	The number of fetch operations per second.	Dependent item	es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Change per second
ES {#ES.NODE}: Total number of fetch	The total number of fetch operations.	Dependent item	es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing fetch	Time in seconds spent performing fetch operations for the last measuring span.	Dependent item	es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing fetch	Time in milliseconds spent performing fetch operations.	Dependent item	es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Fetch latency	The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.fetch_latency[{#ES.NODE}]
ES {#ES.NODE}: Current fetch operations	The number of fetch operations currently running.	Dependent item	es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_current.first()`
ES {#ES.NODE}: Write thread pool executor tasks completed	The number of tasks completed by the write thread pool executor.	Dependent item	es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.completed.first()` Change per second
ES {#ES.NODE}: Write thread pool active threads	The number of active threads in the write thread pool.	Dependent item	es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.active.first()`
ES {#ES.NODE}: Write thread pool tasks in queue	The number of tasks in queue for the write thread pool.	Dependent item	es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.queue.first()`
ES {#ES.NODE}: Write thread pool executor tasks rejected	The number of tasks rejected by the write thread pool executor.	Dependent item	es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.rejected.first()` Change per second
ES {#ES.NODE}: Search thread pool executor tasks completed	The number of tasks completed by the search thread pool executor.	Dependent item	es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.completed.first()` Change per second
ES {#ES.NODE}: Search thread pool active threads	The number of active threads in the search thread pool.	Dependent item	es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.active.first()`
ES {#ES.NODE}: Search thread pool tasks in queue	The number of tasks in queue for the search thread pool.	Dependent item	es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.queue.first()`
ES {#ES.NODE}: Search thread pool executor tasks rejected	The number of tasks rejected by the search thread pool executor.	Dependent item	es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.rejected.first()` Change per second
ES {#ES.NODE}: Refresh thread pool executor tasks completed	The number of tasks completed by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.completed.first()` Change per second
ES {#ES.NODE}: Refresh thread pool active threads	The number of active threads in the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.active.first()`
ES {#ES.NODE}: Refresh thread pool tasks in queue	The number of tasks in queue for the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.queue.first()`
ES {#ES.NODE}: Refresh thread pool executor tasks rejected	The number of tasks rejected by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.rejected.first()` Change per second
ES {#ES.NODE}: Total number of indexing	The total number of indexing operations.	Dependent item	es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent performing indexing	Total time in milliseconds spent performing indexing operations.	Dependent item	es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Indexing latency	The average indexing latency calculated from the available index_total and index_time_in_millis metrics.	Calculated	es.node.indices.indexing.index_latency[{#ES.NODE}]
ES {#ES.NODE}: Current indexing operations	The number of indexing operations currently running.	Dependent item	es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_current.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total number of index flushes to disk	The total number of flush operations.	Dependent item	es.node.indices.flush.total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent on flushing indices to disk	Total time in milliseconds spent performing flush operations.	Dependent item	es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Flush latency	The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.	Calculated	es.node.indices.flush.latency[{#ES.NODE}]
ES {#ES.NODE}: Rate of index refreshes	The number of refresh operations per second.	Dependent item	es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total.first()` Change per second
ES {#ES.NODE}: Time spent performing refresh	Time in seconds spent performing refresh operations for the last measuring span.	Dependent item	es.node.indices.refresh.time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total_time_in_millis.first()` Custom multiplier: `0.001` Simple change

Trigger prototypes for Cluster nodes discovery

Name	Description	Expression	Severity	Dependencies and additional info
ES {#ES.NODE}: has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m`	Info	Manual close: Yes
ES {#ES.NODE}: Percent of JVM heap in use is high	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}`	Warning	Depends on: ES {#ES.NODE}: Percent of JVM heap in use is critical
ES {#ES.NODE}: Percent of JVM heap in use is critical	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}`	High
ES {#ES.NODE}: Query latency is too high	If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}`	Warning
ES {#ES.NODE}: Fetch latency is too high	The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}`	Warning
ES {#ES.NODE}: Write thread pool executor has the rejected tasks	The number of tasks rejected by the write thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0`	Warning
ES {#ES.NODE}: Search thread pool executor has the rejected tasks	The number of tasks rejected by the search thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0`	Warning
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks	The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0`	Warning
ES {#ES.NODE}: Indexing latency is too high	If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).	`min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}`	Warning
ES {#ES.NODE}: Flush latency is too high	If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}`	Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 6.2

Also available for: 7.2 7.0 6.4 6.0 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/elasticsearch_http?at=release/6.2

Elasticsearch Cluster by HTTP

Overview

For Zabbix version: 6.2 and higher
The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.

This template was tested on:

Elasticsearch, version 6.5..7.6

Setup

See Zabbix template operation for basic instructions.

You can set {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template for using on the host level. If you use an atypical location ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}	Maximum of fetch latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}	Maximum of flush latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}	The maximum percent in the use of JVM heap for critically trigger expression.	`95`
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}	The maximum percent in the use of JVM heap for warning trigger expression.	`85`
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}	Maximum of indexing latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.PASSWORD}	The password of the Elasticsearch.	``
{$ELASTICSEARCH.PORT}	The port of the Elasticsearch host.	`9200`
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}	Maximum of query latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}	The ES cluster maximum response time in seconds for trigger expression.	`10s`
{$ELASTICSEARCH.SCHEME}	The scheme of the Elasticsearch (http/https).	`http`
{$ELASTICSEARCH.USERNAME}	The username of the Elasticsearch.	``

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info

Cluster nodes discovery

Name	Description	Type	Key and additional info
Cluster nodes discovery	Discovery ES cluster nodes.	HTTP_AGENT	es.nodes.discovery Preprocessing: - JSONPATH: `$.nodes.[*]` - DISCARD_UNCHANGED_HEARTBEAT: `1d`

Discovery ES cluster nodes.

HTTP_AGENT

es.nodes.discovery

Preprocessing:

- JSONPATH: $.nodes.[*]

- DISCARD_UNCHANGED_HEARTBEAT: 1d

Items collected

Group	Name	Description	Type	Key and additional info
ES cluster	ES: Service status	Checks if the service is running and accepting TCP connections.	SIMPLE	net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `10m`
ES cluster	ES: Service response time	Checks performance of the TCP service.	SIMPLE	net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"]
ES cluster	ES: Cluster health status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	DEPENDENT	es.cluster.status Preprocessing: - JSONPATH: `$.status` - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Number of nodes	The number of nodes within the cluster.	DEPENDENT	es.cluster.number_of_nodes Preprocessing: - JSONPATH: `$.number_of_nodes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Number of data nodes	The number of nodes that are dedicated to data nodes.	DEPENDENT	es.cluster.number_of_data_nodes Preprocessing: - JSONPATH: `$.number_of_data_nodes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Number of relocating shards	The number of shards that are under relocation.	DEPENDENT	es.cluster.relocating_shards Preprocessing: - JSONPATH: `$.relocating_shards`
ES cluster	ES: Number of initializing shards	The number of shards that are under initialization.	DEPENDENT	es.cluster.initializing_shards Preprocessing: - JSONPATH: `$.initializing_shards`
ES cluster	ES: Number of unassigned shards	The number of shards that are not allocated.	DEPENDENT	es.cluster.unassigned_shards Preprocessing: - JSONPATH: `$.unassigned_shards`
ES cluster	ES: Delayed unassigned shards	The number of shards whose allocation has been delayed by the timeout settings.	DEPENDENT	es.cluster.delayed_unassigned_shards Preprocessing: - JSONPATH: `$.delayed_unassigned_shards`
ES cluster	ES: Number of pending tasks	The number of cluster-level changes that have not yet been executed.	DEPENDENT	es.cluster.number_of_pending_tasks Preprocessing: - JSONPATH: `$.number_of_pending_tasks`
ES cluster	ES: Task max waiting in queue	The time expressed in seconds since the earliest initiated task is waiting for being performed.	DEPENDENT	es.cluster.task_max_waiting_in_queue Preprocessing: - JSONPATH: `$.task_max_waiting_in_queue_millis` - MULTIPLIER: `0.001`
ES cluster	ES: Inactive shards percentage	The ratio of inactive shards in the cluster expressed as a percentage.	DEPENDENT	es.cluster.inactive_shards_percent_as_number Preprocessing: - JSONPATH: `$.active_shards_percent_as_number` - JAVASCRIPT: `return (100 - value)`
ES cluster	ES: Cluster uptime	Uptime duration in seconds since JVM has last started.	DEPENDENT	es.nodes.jvm.max_uptime Preprocessing: - JSONPATH: `$.nodes.jvm.max_uptime_in_millis` - MULTIPLIER: `0.001`
ES cluster	ES: Number of non-deleted documents	The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields.	DEPENDENT	es.indices.docs.count Preprocessing: - JSONPATH: `$.indices.docs.count` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Indices with shards assigned to nodes	The total number of indices with shards assigned to the selected nodes.	DEPENDENT	es.indices.count Preprocessing: - JSONPATH: `$.indices.count` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Total size of all file stores	The total size in bytes of all file stores across all selected nodes.	DEPENDENT	es.nodes.fs.total_in_bytes Preprocessing: - JSONPATH: `$.nodes.fs.total_in_bytes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Total available size to JVM in all file stores	The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use.	DEPENDENT	es.nodes.fs.available_in_bytes Preprocessing: - JSONPATH: `$.nodes.fs.available_in_bytes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Nodes with the data role	The number of selected nodes with the data role.	DEPENDENT	es.nodes.count.data Preprocessing: - JSONPATH: `$.nodes.count.data` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Nodes with the ingest role	The number of selected nodes with the ingest role.	DEPENDENT	es.nodes.count.ingest Preprocessing: - JSONPATH: `$.nodes.count.ingest` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Nodes with the master role	The number of selected nodes with the master role.	DEPENDENT	es.nodes.count.master Preprocessing: - JSONPATH: `$.nodes.count.master` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Total size	Total size (in bytes) of all file stores.	DEPENDENT	es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].fs.total.total_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
ES cluster	ES {#ES.NODE}: Total available size	The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize.	DEPENDENT	es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].fs.total.available_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Node uptime	JVM uptime in seconds.	DEPENDENT	es.node.jvm.uptime[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.uptime_in_millis.first()` - MULTIPLIER: `0.001`
ES cluster	ES {#ES.NODE}: Maximum JVM memory available for use	The maximum amount of memory, in bytes, available for use by the heap.	DEPENDENT	es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_max_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
ES cluster	ES {#ES.NODE}: Amount of JVM heap currently in use	The memory, in bytes, currently in use by the heap.	DEPENDENT	es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Percent of JVM heap currently in use	The percentage of memory currently in use by the heap.	DEPENDENT	es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_percent.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Amount of JVM heap committed	The amount of memory, in bytes, available for use by the heap.	DEPENDENT	es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_committed_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Number of open HTTP connections	The number of currently open HTTP connections for the node.	DEPENDENT	es.node.http.current_open[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].http.current_open.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Rate of HTTP connections opened	The number of HTTP connections opened for the node per second.	DEPENDENT	es.node.http.opened.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].http.total_opened.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Time spent throttling operations	Time in seconds spent throttling operations for the last measuring span.	DEPENDENT	es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.throttle_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Time spent throttling recovery operations	Time in seconds spent throttling recovery operations for the last measuring span.	DEPENDENT	es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.recovery.throttle_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Time spent throttling merge operations	Time in seconds spent throttling merge operations for the last measuring span.	DEPENDENT	es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.merges.total_throttled_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Rate of queries	The number of query operations per second.	DEPENDENT	es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Time spent performing query	Time in seconds spent performing query operations for the last measuring span.	DEPENDENT	es.node.indices.search.query_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Query latency	The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.	CALCULATED	es.node.indices.search.query_latency[{#ES.NODE}] Expression: `change(//es.node.indices.search.query_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.query_total[{#ES.NODE}]) + (change(//es.node.indices.search.query_total[{#ES.NODE}]) = 0) )`
ES cluster	ES {#ES.NODE}: Current query operations	The number of query operations currently running.	DEPENDENT	es.node.indices.search.query_current[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_current.first()`
ES cluster	ES {#ES.NODE}: Rate of fetch	The number of fetch operations per second.	DEPENDENT	es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Time spent performing fetch	Time in seconds spent performing fetch operations for the last measuring span.	DEPENDENT	es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Fetch latency	The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.	CALCULATED	es.node.indices.search.fetch_latency[{#ES.NODE}] Expression: `change(//es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.fetch_total[{#ES.NODE}]) + (change(//es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) )`
ES cluster	ES {#ES.NODE}: Current fetch operations	The number of fetch operations currently running.	DEPENDENT	es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_current.first()`
ES cluster	ES {#ES.NODE}: Write thread pool executor tasks completed	The number of tasks completed by the write thread pool executor.	DEPENDENT	es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.completed.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Write thread pool active threads	The number of active threads in the write thread pool.	DEPENDENT	es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.active.first()`
ES cluster	ES {#ES.NODE}: Write thread pool tasks in queue	The number of tasks in queue for the write thread pool.	DEPENDENT	es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.queue.first()`
ES cluster	ES {#ES.NODE}: Write thread pool executor tasks rejected	The number of tasks rejected by the write thread pool executor.	DEPENDENT	es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.rejected.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Search thread pool executor tasks completed	The number of tasks completed by the search thread pool executor.	DEPENDENT	es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.completed.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Search thread pool active threads	The number of active threads in the search thread pool.	DEPENDENT	es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.active.first()`
ES cluster	ES {#ES.NODE}: Search thread pool tasks in queue	The number of tasks in queue for the search thread pool.	DEPENDENT	es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.queue.first()`
ES cluster	ES {#ES.NODE}: Search thread pool executor tasks rejected	The number of tasks rejected by the search thread pool executor.	DEPENDENT	es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.rejected.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Refresh thread pool executor tasks completed	The number of tasks completed by the refresh thread pool executor.	DEPENDENT	es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.completed.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Refresh thread pool active threads	The number of active threads in the refresh thread pool.	DEPENDENT	es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.active.first()`
ES cluster	ES {#ES.NODE}: Refresh thread pool tasks in queue	The number of tasks in queue for the refresh thread pool.	DEPENDENT	es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.queue.first()`
ES cluster	ES {#ES.NODE}: Refresh thread pool executor tasks rejected	The number of tasks rejected by the refresh thread pool executor.	DEPENDENT	es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.rejected.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Indexing latency	The average indexing latency calculated from the available index_total and index_time_in_millis metrics.	CALCULATED	es.node.indices.indexing.index_latency[{#ES.NODE}] Expression: `change(//es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.indexing.index_total[{#ES.NODE}]) + (change(//es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) )`
ES cluster	ES {#ES.NODE}: Current indexing operations	The number of indexing operations currently running.	DEPENDENT	es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_current.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Flush latency	The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.	CALCULATED	es.node.indices.flush.latency[{#ES.NODE}] Expression: `change(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.flush.total[{#ES.NODE}]) + (change(//es.node.indices.flush.total[{#ES.NODE}]) = 0) )`
ES cluster	ES {#ES.NODE}: Rate of index refreshes	The number of refresh operations per second.	DEPENDENT	es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.refresh.total.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Time spent performing refresh	Time in seconds spent performing refresh operations for the last measuring span.	DEPENDENT	es.node.indices.refresh.time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.refresh.total_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
Zabbix raw items	ES: Get cluster health	Returns the health status of a cluster.	HTTP_AGENT	es.cluster.get_health
Zabbix raw items	ES: Get cluster stats	Returns cluster statistics.	HTTP_AGENT	es.cluster.get_stats
Zabbix raw items	ES: Get nodes stats	Returns cluster nodes statistics.	HTTP_AGENT	es.nodes.get_stats
Zabbix raw items	ES {#ES.NODE}: Total number of query	The total number of query operations.	DEPENDENT	es.node.indices.search.query_total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total time spent performing query	Time in milliseconds spent performing query operations.	DEPENDENT	es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total number of fetch	The total number of fetch operations.	DEPENDENT	es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total time spent performing fetch	Time in milliseconds spent performing fetch operations.	DEPENDENT	es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total number of indexing	The total number of indexing operations.	DEPENDENT	es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total time spent performing indexing	Total time in milliseconds spent performing indexing operations.	DEPENDENT	es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total number of index flushes to disk	The total number of flush operations.	DEPENDENT	es.node.indices.flush.total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.flush.total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total time spent on flushing indices to disk	Total time in milliseconds spent performing flush operations.	DEPENDENT	es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.flush.total_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
ES: Service is down	The service is unavailable or does not accept TCP connections.	`last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"])=0`	AVERAGE	Manual close: YES
ES: Service response time is too high	The performance of the TCP service is very low.	`min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}`	WARNING	Manual close: YES Depends on: - ES: Service is down
ES: Health is YELLOW	All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1`	AVERAGE
ES: Health is RED	One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2`	HIGH
ES: Health is UNKNOWN	The health status of the cluster is unknown or cannot be obtained.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255`	HIGH
ES: The number of nodes within the cluster has decreased	-	`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0`	INFO	Manual close: YES
ES: The number of nodes within the cluster has increased	-	`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0`	INFO	Manual close: YES
ES: Cluster has the initializing shards	The cluster has the initializing shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0`	AVERAGE
ES: Cluster has the unassigned shards	The cluster has the unassigned shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0`	AVERAGE
ES: Cluster has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m`	INFO	Manual close: YES
ES: Cluster does not have enough space for resharding	There is not enough disk space for index resharding.	`(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes)`	HIGH
ES: Cluster has only two master nodes	The cluster has only two nodes with a master role and will be unavailable if one of them breaks.	`last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2`	DISASTER
ES {#ES.NODE}: has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m`	INFO	Manual close: YES
ES {#ES.NODE}: Percent of JVM heap in use is high	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}`	WARNING	Depends on: - ES {#ES.NODE}: Percent of JVM heap in use is critical
ES {#ES.NODE}: Percent of JVM heap in use is critical	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}`	HIGH
ES {#ES.NODE}: Query latency is too high	If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}`	WARNING
ES {#ES.NODE}: Fetch latency is too high	The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}`	WARNING
ES {#ES.NODE}: Write thread pool executor has the rejected tasks	The number of tasks rejected by the write thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0`	WARNING
ES {#ES.NODE}: Search thread pool executor has the rejected tasks	The number of tasks rejected by the search thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0`	WARNING
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks	The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0`	WARNING
ES {#ES.NODE}: Indexing latency is too high	If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).	`min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}`	WARNING
ES {#ES.NODE}: Flush latency is too high	If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}`	WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

References

https://www.elastic.co/guide/en/elasticsearch/reference/index.html

This template is for Zabbix version: 6.0

Also available for: 7.2 7.0 6.4 6.2 5.4 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/elasticsearch_http?at=release/6.0

Elasticsearch Cluster by HTTP

Overview

The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

Elasticsearch 6.5, 7.6

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

Macros used

Name	Description	Default
{$ELASTICSEARCH.USERNAME}	The username of the Elasticsearch.
{$ELASTICSEARCH.PASSWORD}	The password of the Elasticsearch.
{$ELASTICSEARCH.PORT}	The port of the Elasticsearch host.	`9200`
{$ELASTICSEARCH.SCHEME}	The scheme of the Elasticsearch (http/https).	`http`
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}	The ES cluster maximum response time in seconds for trigger expression.	`10s`
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}	Maximum of query latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}	Maximum of fetch latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}	Maximum of indexing latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}	Maximum of flush latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}	The maximum percent in the use of JVM heap for warning trigger expression.	`85`
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}	The maximum percent in the use of JVM heap for critically trigger expression.	`95`

Items

Name	Description	Type	Key and additional info
ES: Service status	Checks if the service is running and accepting TCP connections.	Simple check	net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
ES: Service response time	Checks performance of the TCP service.	Simple check	net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"]
ES: Get cluster health	Returns the health status of a cluster.	HTTP agent	es.cluster.get_health
ES: Cluster health status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	Dependent item	es.cluster.status Preprocessing JSON Path: `$.status` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
ES: Number of nodes	The number of nodes within the cluster.	Dependent item	es.cluster.number_of_nodes Preprocessing JSON Path: `$.number_of_nodes` Discard unchanged with heartbeat: `1h`
ES: Number of data nodes	The number of nodes that are dedicated to data nodes.	Dependent item	es.cluster.number_of_data_nodes Preprocessing JSON Path: `$.number_of_data_nodes` Discard unchanged with heartbeat: `1h`
ES: Number of relocating shards	The number of shards that are under relocation.	Dependent item	es.cluster.relocating_shards Preprocessing JSON Path: `$.relocating_shards`
ES: Number of initializing shards	The number of shards that are under initialization.	Dependent item	es.cluster.initializing_shards Preprocessing JSON Path: `$.initializing_shards`
ES: Number of unassigned shards	The number of shards that are not allocated.	Dependent item	es.cluster.unassigned_shards Preprocessing JSON Path: `$.unassigned_shards`
ES: Delayed unassigned shards	The number of shards whose allocation has been delayed by the timeout settings.	Dependent item	es.cluster.delayed_unassigned_shards Preprocessing JSON Path: `$.delayed_unassigned_shards`
ES: Number of pending tasks	The number of cluster-level changes that have not yet been executed.	Dependent item	es.cluster.number_of_pending_tasks Preprocessing JSON Path: `$.number_of_pending_tasks`
ES: Task max waiting in queue	The time expressed in seconds since the earliest initiated task is waiting for being performed.	Dependent item	es.cluster.task_max_waiting_in_queue Preprocessing JSON Path: `$.task_max_waiting_in_queue_millis` Custom multiplier: `0.001`
ES: Inactive shards percentage	The ratio of inactive shards in the cluster expressed as a percentage.	Dependent item	es.cluster.inactive_shards_percent_as_number Preprocessing JSON Path: `$.active_shards_percent_as_number` JavaScript: `The text is too long. Please see the template.`
ES: Get cluster stats	Returns cluster statistics.	HTTP agent	es.cluster.get_stats
ES: Cluster uptime	Uptime duration in seconds since JVM has last started.	Dependent item	es.nodes.jvm.max_uptime Preprocessing JSON Path: `$.nodes.jvm.max_uptime_in_millis` Custom multiplier: `0.001`
ES: Number of non-deleted documents	The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields.	Dependent item	es.indices.docs.count Preprocessing JSON Path: `$.indices.docs.count` Discard unchanged with heartbeat: `1h`
ES: Indices with shards assigned to nodes	The total number of indices with shards assigned to the selected nodes.	Dependent item	es.indices.count Preprocessing JSON Path: `$.indices.count` Discard unchanged with heartbeat: `1h`
ES: Total size of all file stores	The total size in bytes of all file stores across all selected nodes.	Dependent item	es.nodes.fs.total_in_bytes Preprocessing JSON Path: `$.nodes.fs.total_in_bytes` Discard unchanged with heartbeat: `1h`
ES: Total available size to JVM in all file stores	The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use.	Dependent item	es.nodes.fs.available_in_bytes Preprocessing JSON Path: `$.nodes.fs.available_in_bytes` Discard unchanged with heartbeat: `1h`
ES: Nodes with the data role	The number of selected nodes with the data role.	Dependent item	es.nodes.count.data Preprocessing JSON Path: `$.nodes.count.data` Discard unchanged with heartbeat: `1h`
ES: Nodes with the ingest role	The number of selected nodes with the ingest role.	Dependent item	es.nodes.count.ingest Preprocessing JSON Path: `$.nodes.count.ingest` Discard unchanged with heartbeat: `1h`
ES: Nodes with the master role	The number of selected nodes with the master role.	Dependent item	es.nodes.count.master Preprocessing JSON Path: `$.nodes.count.master` Discard unchanged with heartbeat: `1h`
ES: Get nodes stats	Returns cluster nodes statistics.	HTTP agent	es.nodes.get_stats

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
ES: Service is down	The service is unavailable or does not accept TCP connections.	`last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"])=0`	Average	Manual close: Yes
ES: Service response time is too high	The performance of the TCP service is very low.	`min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}`	Warning	Manual close: Yes Depends on: ES: Service is down
ES: Health is YELLOW	All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1`	Average
ES: Health is RED	One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2`	High
ES: Health is UNKNOWN	The health status of the cluster is unknown or cannot be obtained.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255`	High
ES: The number of nodes within the cluster has decreased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0`	Info	Manual close: Yes
ES: The number of nodes within the cluster has increased		`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0`	Info	Manual close: Yes
ES: Cluster has the initializing shards	The cluster has the initializing shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0`	Average
ES: Cluster has the unassigned shards	The cluster has the unassigned shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0`	Average
ES: Cluster has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m`	Info	Manual close: Yes
ES: Cluster does not have enough space for resharding	There is not enough disk space for index resharding.	`(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes)`	High
ES: Cluster has only two master nodes	The cluster has only two nodes with a master role and will be unavailable if one of them breaks.	`last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2`	Disaster

LLD rule Cluster nodes discovery

Name Description Type Key and additional info

Cluster nodes discovery

Name	Description	Type	Key and additional info
Cluster nodes discovery	Discovery ES cluster nodes.	HTTP agent	es.nodes.discovery Preprocessing JSON Path: `$.nodes.[*]` Discard unchanged with heartbeat: `1d`

Discovery ES cluster nodes.

HTTP agent

es.nodes.discovery

Preprocessing

JSON Path: $.nodes.[*]
Discard unchanged with heartbeat: 1d

Item prototypes for Cluster nodes discovery

Name	Description	Type	Key and additional info
ES {#ES.NODE}: Get data	Returns cluster nodes statistics.	Dependent item	es.node.get.data[{#ES.NODE}] Preprocessing JSON Path: `$..[?(@.name=='{#ES.NODE}')].first()`
ES {#ES.NODE}: Total size	Total size (in bytes) of all file stores.	Dependent item	es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.total_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Total available size	The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize.	Dependent item	es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..fs.total.available_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Node uptime	JVM uptime in seconds.	Dependent item	es.node.jvm.uptime[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.uptime_in_millis.first()` Custom multiplier: `0.001`
ES {#ES.NODE}: Maximum JVM memory available for use	The maximum amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_max_in_bytes.first()` Discard unchanged with heartbeat: `1d`
ES {#ES.NODE}: Amount of JVM heap currently in use	The memory, in bytes, currently in use by the heap.	Dependent item	es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Percent of JVM heap currently in use	The percentage of memory currently in use by the heap.	Dependent item	es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_used_percent.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Amount of JVM heap committed	The amount of memory, in bytes, available for use by the heap.	Dependent item	es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing JSON Path: `$..jvm.mem.heap_committed_in_bytes.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Number of open HTTP connections	The number of currently open HTTP connections for the node.	Dependent item	es.node.http.current_open[{#ES.NODE}] Preprocessing JSON Path: `$..http.current_open.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Rate of HTTP connections opened	The number of HTTP connections opened for the node per second.	Dependent item	es.node.http.opened.rate[{#ES.NODE}] Preprocessing JSON Path: `$..http.total_opened.first()` Change per second
ES {#ES.NODE}: Time spent throttling operations	Time in seconds spent throttling operations for the last measuring span.	Dependent item	es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling recovery operations	Time in seconds spent throttling recovery operations for the last measuring span.	Dependent item	es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.recovery.throttle_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Time spent throttling merge operations	Time in seconds spent throttling merge operations for the last measuring span.	Dependent item	es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.merges.total_throttled_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Rate of queries	The number of query operations per second.	Dependent item	es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Change per second
ES {#ES.NODE}: Total number of query	The total number of query operations.	Dependent item	es.node.indices.search.query_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing query	Time in seconds spent performing query operations for the last measuring span.	Dependent item	es.node.indices.search.query_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing query	Time in milliseconds spent performing query operations.	Dependent item	es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Query latency	The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.query_latency[{#ES.NODE}]
ES {#ES.NODE}: Current query operations	The number of query operations currently running.	Dependent item	es.node.indices.search.query_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.query_current.first()`
ES {#ES.NODE}: Rate of fetch	The number of fetch operations per second.	Dependent item	es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Change per second
ES {#ES.NODE}: Total number of fetch	The total number of fetch operations.	Dependent item	es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Time spent performing fetch	Time in seconds spent performing fetch operations for the last measuring span.	Dependent item	es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Custom multiplier: `0.001` Simple change
ES {#ES.NODE}: Total time spent performing fetch	Time in milliseconds spent performing fetch operations.	Dependent item	es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Fetch latency	The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.	Calculated	es.node.indices.search.fetch_latency[{#ES.NODE}]
ES {#ES.NODE}: Current fetch operations	The number of fetch operations currently running.	Dependent item	es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.search.fetch_current.first()`
ES {#ES.NODE}: Write thread pool executor tasks completed	The number of tasks completed by the write thread pool executor.	Dependent item	es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.completed.first()` Change per second
ES {#ES.NODE}: Write thread pool active threads	The number of active threads in the write thread pool.	Dependent item	es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.active.first()`
ES {#ES.NODE}: Write thread pool tasks in queue	The number of tasks in queue for the write thread pool.	Dependent item	es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.queue.first()`
ES {#ES.NODE}: Write thread pool executor tasks rejected	The number of tasks rejected by the write thread pool executor.	Dependent item	es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.write.rejected.first()` Change per second
ES {#ES.NODE}: Search thread pool executor tasks completed	The number of tasks completed by the search thread pool executor.	Dependent item	es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.completed.first()` Change per second
ES {#ES.NODE}: Search thread pool active threads	The number of active threads in the search thread pool.	Dependent item	es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.active.first()`
ES {#ES.NODE}: Search thread pool tasks in queue	The number of tasks in queue for the search thread pool.	Dependent item	es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.queue.first()`
ES {#ES.NODE}: Search thread pool executor tasks rejected	The number of tasks rejected by the search thread pool executor.	Dependent item	es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.search.rejected.first()` Change per second
ES {#ES.NODE}: Refresh thread pool executor tasks completed	The number of tasks completed by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.completed.first()` Change per second
ES {#ES.NODE}: Refresh thread pool active threads	The number of active threads in the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.active.first()`
ES {#ES.NODE}: Refresh thread pool tasks in queue	The number of tasks in queue for the refresh thread pool.	Dependent item	es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.queue.first()`
ES {#ES.NODE}: Refresh thread pool executor tasks rejected	The number of tasks rejected by the refresh thread pool executor.	Dependent item	es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing JSON Path: `$..thread_pool.refresh.rejected.first()` Change per second
ES {#ES.NODE}: Total number of indexing	The total number of indexing operations.	Dependent item	es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent performing indexing	Total time in milliseconds spent performing indexing operations.	Dependent item	es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Indexing latency	The average indexing latency calculated from the available index_total and index_time_in_millis metrics.	Calculated	es.node.indices.indexing.index_latency[{#ES.NODE}]
ES {#ES.NODE}: Current indexing operations	The number of indexing operations currently running.	Dependent item	es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing JSON Path: `$..indices.indexing.index_current.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total number of index flushes to disk	The total number of flush operations.	Dependent item	es.node.indices.flush.total[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Total time spent on flushing indices to disk	Total time in milliseconds spent performing flush operations.	Dependent item	es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing JSON Path: `$..indices.flush.total_time_in_millis.first()` Discard unchanged with heartbeat: `1h`
ES {#ES.NODE}: Flush latency	The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.	Calculated	es.node.indices.flush.latency[{#ES.NODE}]
ES {#ES.NODE}: Rate of index refreshes	The number of refresh operations per second.	Dependent item	es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total.first()` Change per second
ES {#ES.NODE}: Time spent performing refresh	Time in seconds spent performing refresh operations for the last measuring span.	Dependent item	es.node.indices.refresh.time[{#ES.NODE}] Preprocessing JSON Path: `$..indices.refresh.total_time_in_millis.first()` Custom multiplier: `0.001` Simple change

Trigger prototypes for Cluster nodes discovery

Name	Description	Expression	Severity	Dependencies and additional info
ES {#ES.NODE}: has been restarted	Uptime is less than 10 minutes.	`last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m`	Info	Manual close: Yes
ES {#ES.NODE}: Percent of JVM heap in use is high	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}`	Warning	Depends on: ES {#ES.NODE}: Percent of JVM heap in use is critical
ES {#ES.NODE}: Percent of JVM heap in use is critical	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}`	High
ES {#ES.NODE}: Query latency is too high	If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}`	Warning
ES {#ES.NODE}: Fetch latency is too high	The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}`	Warning
ES {#ES.NODE}: Write thread pool executor has the rejected tasks	The number of tasks rejected by the write thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0`	Warning
ES {#ES.NODE}: Search thread pool executor has the rejected tasks	The number of tasks rejected by the search thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0`	Warning
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks	The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0`	Warning
ES {#ES.NODE}: Indexing latency is too high	If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).	`min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}`	Warning
ES {#ES.NODE}: Flush latency is too high	If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}`	Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

This template is for Zabbix version: 5.4

Also available for: 7.2 7.0 6.4 6.2 6.0 5.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/elasticsearch_http?at=release/5.4

Elasticsearch Cluster by HTTP

Overview

For Zabbix version: 5.4 and higher
The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.

This template was tested on:

Elasticsearch, version 6.5..7.6

Setup

See Zabbix template operation for basic instructions.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}	Maximum of fetch latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}	Maximum of flush latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}	The maximum percent in the use of JVM heap for critically trigger expression.	`95`
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}	The maximum percent in the use of JVM heap for warning trigger expression.	`85`
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}	Maximum of indexing latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.PASSWORD}	The password of the Elasticsearch.	``
{$ELASTICSEARCH.PORT}	The port of the Elasticsearch host.	`9200`
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}	Maximum of query latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}	The ES cluster maximum response time in seconds for trigger expression.	`10s`
{$ELASTICSEARCH.SCHEME}	The scheme of the Elasticsearch (http/https).	`http`
{$ELASTICSEARCH.USERNAME}	The username of the Elasticsearch.	``

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info

Cluster nodes discovery

Name	Description	Type	Key and additional info
Cluster nodes discovery	Discovery ES cluster nodes.	HTTP_AGENT	es.nodes.discovery Preprocessing: - JSONPATH: `$.nodes.[*]` - DISCARD_UNCHANGED_HEARTBEAT: `1d`

Discovery ES cluster nodes.

HTTP_AGENT

es.nodes.discovery

Preprocessing:

- JSONPATH: $.nodes.[*]

- DISCARD_UNCHANGED_HEARTBEAT: 1d

Items collected

Group	Name	Description	Type	Key and additional info
ES cluster	ES: Service status	Checks if the service is running and accepting TCP connections.	SIMPLE	net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `10m`
ES cluster	ES: Service response time	Checks performance of the TCP service.	SIMPLE	net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"]
ES cluster	ES: Cluster health status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	DEPENDENT	es.cluster.status Preprocessing: - JSONPATH: `$.status` - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Number of nodes	The number of nodes within the cluster.	DEPENDENT	es.cluster.number_of_nodes Preprocessing: - JSONPATH: `$.number_of_nodes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Number of data nodes	The number of nodes that are dedicated to data nodes.	DEPENDENT	es.cluster.number_of_data_nodes Preprocessing: - JSONPATH: `$.number_of_data_nodes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Number of relocating shards	The number of shards that are under relocation.	DEPENDENT	es.cluster.relocating_shards Preprocessing: - JSONPATH: `$.relocating_shards`
ES cluster	ES: Number of initializing shards	The number of shards that are under initialization.	DEPENDENT	es.cluster.initializing_shards Preprocessing: - JSONPATH: `$.initializing_shards`
ES cluster	ES: Number of unassigned shards	The number of shards that are not allocated.	DEPENDENT	es.cluster.unassigned_shards Preprocessing: - JSONPATH: `$.unassigned_shards`
ES cluster	ES: Delayed unassigned shards	The number of shards whose allocation has been delayed by the timeout settings.	DEPENDENT	es.cluster.delayed_unassigned_shards Preprocessing: - JSONPATH: `$.delayed_unassigned_shards`
ES cluster	ES: Number of pending tasks	The number of cluster-level changes that have not yet been executed.	DEPENDENT	es.cluster.number_of_pending_tasks Preprocessing: - JSONPATH: `$.number_of_pending_tasks`
ES cluster	ES: Task max waiting in queue	The time expressed in seconds since the earliest initiated task is waiting for being performed.	DEPENDENT	es.cluster.task_max_waiting_in_queue Preprocessing: - JSONPATH: `$.task_max_waiting_in_queue_millis` - MULTIPLIER: `0.001`
ES cluster	ES: Inactive shards percentage	The ratio of inactive shards in the cluster expressed as a percentage.	DEPENDENT	es.cluster.inactive_shards_percent_as_number Preprocessing: - JSONPATH: `$.active_shards_percent_as_number` - JAVASCRIPT: `return (100 - value)`
ES cluster	ES: Cluster uptime	Uptime duration in seconds since JVM has last started.	DEPENDENT	es.nodes.jvm.max_uptime Preprocessing: - JSONPATH: `$.nodes.jvm.max_uptime_in_millis` - MULTIPLIER: `0.001`
ES cluster	ES: Number of non-deleted documents	The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields.	DEPENDENT	es.indices.docs.count Preprocessing: - JSONPATH: `$.indices.docs.count` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Indices with shards assigned to nodes	The total number of indices with shards assigned to the selected nodes.	DEPENDENT	es.indices.count Preprocessing: - JSONPATH: `$.indices.count` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Total size of all file stores	The total size in bytes of all file stores across all selected nodes.	DEPENDENT	es.nodes.fs.total_in_bytes Preprocessing: - JSONPATH: `$.nodes.fs.total_in_bytes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Total available size to JVM in all file stores	The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use.	DEPENDENT	es.nodes.fs.available_in_bytes Preprocessing: - JSONPATH: `$.nodes.fs.available_in_bytes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Nodes with the data role	The number of selected nodes with the data role.	DEPENDENT	es.nodes.count.data Preprocessing: - JSONPATH: `$.nodes.count.data` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Nodes with the ingest role	The number of selected nodes with the ingest role.	DEPENDENT	es.nodes.count.ingest Preprocessing: - JSONPATH: `$.nodes.count.ingest` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES: Nodes with the master role	The number of selected nodes with the master role.	DEPENDENT	es.nodes.count.master Preprocessing: - JSONPATH: `$.nodes.count.master` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Total size	Total size (in bytes) of all file stores.	DEPENDENT	es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].fs.total.total_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
ES cluster	ES {#ES.NODE}: Total available size	The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize.	DEPENDENT	es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].fs.total.available_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Node uptime	JVM uptime in seconds.	DEPENDENT	es.node.jvm.uptime[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.uptime_in_millis.first()` - MULTIPLIER: `0.001`
ES cluster	ES {#ES.NODE}: Maximum JVM memory available for use	The maximum amount of memory, in bytes, available for use by the heap.	DEPENDENT	es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_max_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
ES cluster	ES {#ES.NODE}: Amount of JVM heap currently in use	The memory, in bytes, currently in use by the heap.	DEPENDENT	es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Percent of JVM heap currently in use	The percentage of memory currently in use by the heap.	DEPENDENT	es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_percent.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Amount of JVM heap committed	The amount of memory, in bytes, available for use by the heap.	DEPENDENT	es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_committed_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Number of open HTTP connections	The number of currently open HTTP connections for the node.	DEPENDENT	es.node.http.current_open[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].http.current_open.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Rate of HTTP connections opened	The number of HTTP connections opened for the node per second.	DEPENDENT	es.node.http.opened.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].http.total_opened.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Time spent throttling operations	Time in seconds spent throttling operations for the last measuring span.	DEPENDENT	es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.throttle_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Time spent throttling recovery operations	Time in seconds spent throttling recovery operations for the last measuring span.	DEPENDENT	es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.recovery.throttle_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Time spent throttling merge operations	Time in seconds spent throttling merge operations for the last measuring span.	DEPENDENT	es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.merges.total_throttled_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Rate of queries	The number of query operations per second.	DEPENDENT	es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Time spent performing query	Time in seconds spent performing query operations for the last measuring span.	DEPENDENT	es.node.indices.search.query_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Query latency	The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.	CALCULATED	es.node.indices.search.query_latency[{#ES.NODE}] Expression: `change(//es.node.indices.search.query_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.query_total[{#ES.NODE}]) + (change(//es.node.indices.search.query_total[{#ES.NODE}]) = 0) )`
ES cluster	ES {#ES.NODE}: Current query operations	The number of query operations currently running.	DEPENDENT	es.node.indices.search.query_current[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_current.first()`
ES cluster	ES {#ES.NODE}: Rate of fetch	The number of fetch operations per second.	DEPENDENT	es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Time spent performing fetch	Time in seconds spent performing fetch operations for the last measuring span.	DEPENDENT	es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES cluster	ES {#ES.NODE}: Fetch latency	The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.	CALCULATED	es.node.indices.search.fetch_latency[{#ES.NODE}] Expression: `change(//es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.fetch_total[{#ES.NODE}]) + (change(//es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) )`
ES cluster	ES {#ES.NODE}: Current fetch operations	The number of fetch operations currently running.	DEPENDENT	es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_current.first()`
ES cluster	ES {#ES.NODE}: Write thread pool executor tasks completed	The number of tasks completed by the write thread pool executor.	DEPENDENT	es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.completed.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Write thread pool active threads	The number of active threads in the write thread pool.	DEPENDENT	es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.active.first()`
ES cluster	ES {#ES.NODE}: Write thread pool tasks in queue	The number of tasks in queue for the write thread pool.	DEPENDENT	es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.queue.first()`
ES cluster	ES {#ES.NODE}: Write thread pool executor tasks rejected	The number of tasks rejected by the write thread pool executor.	DEPENDENT	es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.rejected.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Search thread pool executor tasks completed	The number of tasks completed by the search thread pool executor.	DEPENDENT	es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.completed.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Search thread pool active threads	The number of active threads in the search thread pool.	DEPENDENT	es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.active.first()`
ES cluster	ES {#ES.NODE}: Search thread pool tasks in queue	The number of tasks in queue for the search thread pool.	DEPENDENT	es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.queue.first()`
ES cluster	ES {#ES.NODE}: Search thread pool executor tasks rejected	The number of tasks rejected by the search thread pool executor.	DEPENDENT	es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.rejected.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Refresh thread pool executor tasks completed	The number of tasks completed by the refresh thread pool executor.	DEPENDENT	es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.completed.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Refresh thread pool active threads	The number of active threads in the refresh thread pool.	DEPENDENT	es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.active.first()`
ES cluster	ES {#ES.NODE}: Refresh thread pool tasks in queue	The number of tasks in queue for the refresh thread pool.	DEPENDENT	es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.queue.first()`
ES cluster	ES {#ES.NODE}: Refresh thread pool executor tasks rejected	The number of tasks rejected by the refresh thread pool executor.	DEPENDENT	es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.rejected.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Indexing latency	The average indexing latency calculated from the available index_total and index_time_in_millis metrics.	CALCULATED	es.node.indices.indexing.index_latency[{#ES.NODE}] Expression: `change(//es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.indexing.index_total[{#ES.NODE}]) + (change(//es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) )`
ES cluster	ES {#ES.NODE}: Current indexing operations	The number of indexing operations currently running.	DEPENDENT	es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_current.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES cluster	ES {#ES.NODE}: Flush latency	The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.	CALCULATED	es.node.indices.flush.latency[{#ES.NODE}] Expression: `change(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.flush.total[{#ES.NODE}]) + (change(//es.node.indices.flush.total[{#ES.NODE}]) = 0) )`
ES cluster	ES {#ES.NODE}: Rate of index refreshes	The number of refresh operations per second.	DEPENDENT	es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.refresh.total.first()` - CHANGE_PER_SECOND
ES cluster	ES {#ES.NODE}: Time spent performing refresh	Time in seconds spent performing refresh operations for the last measuring span.	DEPENDENT	es.node.indices.refresh.time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.refresh.total_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
Zabbix raw items	ES: Get cluster health	Returns the health status of a cluster.	HTTP_AGENT	es.cluster.get_health
Zabbix raw items	ES: Get cluster stats	Returns cluster statistics.	HTTP_AGENT	es.cluster.get_stats
Zabbix raw items	ES: Get nodes stats	Returns cluster nodes statistics.	HTTP_AGENT	es.nodes.get_stats
Zabbix raw items	ES {#ES.NODE}: Total number of query	The total number of query operations.	DEPENDENT	es.node.indices.search.query_total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total time spent performing query	Time in milliseconds spent performing query operations.	DEPENDENT	es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total number of fetch	The total number of fetch operations.	DEPENDENT	es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total time spent performing fetch	Time in milliseconds spent performing fetch operations.	DEPENDENT	es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total number of indexing	The total number of indexing operations.	DEPENDENT	es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total time spent performing indexing	Total time in milliseconds spent performing indexing operations.	DEPENDENT	es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total number of index flushes to disk	The total number of flush operations.	DEPENDENT	es.node.indices.flush.total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.flush.total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix raw items	ES {#ES.NODE}: Total time spent on flushing indices to disk	Total time in milliseconds spent performing flush operations.	DEPENDENT	es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.flush.total_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
ES: Service is down	The service is unavailable or does not accept TCP connections.	`last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"])=0`	AVERAGE	Manual close: YES
ES: Service response time is too high (over {$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} for 5m)	The performance of the TCP service is very low.	`min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}`	WARNING	Manual close: YES Depends on: - ES: Service is down
ES: Health is YELLOW	All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1`	AVERAGE
ES: Health is RED	One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2`	HIGH
ES: Health is UNKNOWN	The health status of the cluster is unknown or cannot be obtained.	`last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255`	HIGH
ES: The number of nodes within the cluster has decreased	-	`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0`	INFO	Manual close: YES
ES: The number of nodes within the cluster has increased	-	`change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0`	INFO	Manual close: YES
ES: Cluster has the initializing shards	The cluster has the initializing shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0`	AVERAGE
ES: Cluster has the unassigned shards	The cluster has the unassigned shards longer than 10 minutes.	`min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0`	AVERAGE
ES: Cluster has been restarted (uptime < 10m)	Uptime is less than 10 minutes	`last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m`	INFO	Manual close: YES
ES: Cluster does not have enough space for resharding	There is not enough disk space for index resharding.	`(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes)`	HIGH
ES: Cluster has only two master nodes	The cluster has only two nodes with a master role and will be unavailable if one of them breaks.	`last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2`	DISASTER
ES {#ES.NODE}: Node {#ES.NODE} has been restarted (uptime < 10m)	Uptime is less than 10 minutes	`last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m`	INFO	Manual close: YES
ES {#ES.NODE}: Percent of JVM heap in use is high (over {$ELASTICSEARCH.HEAP_USED.MAX.WARN}% for 1h)	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}`	WARNING	Depends on: - ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h)
ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h)	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}`	HIGH
ES {#ES.NODE}: Query latency is too high (over {$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}ms for 5m)	If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}`	WARNING
ES {#ES.NODE}: Fetch latency is too high (over {$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}ms for 5m)	The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}`	WARNING
ES {#ES.NODE}: Write thread pool executor has the rejected tasks (for 5m)	The number of tasks rejected by the write thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0`	WARNING
ES {#ES.NODE}: Search thread pool executor has the rejected tasks (for 5m)	The number of tasks rejected by the search thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0`	WARNING
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks (for 5m)	The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.	`min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0`	WARNING
ES {#ES.NODE}: Indexing latency is too high (over {$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}ms for 5m)	If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).	`min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}`	WARNING
ES {#ES.NODE}: Flush latency is too high (over {$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}ms for 5m)	If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index.	`min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}`	WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.

References

https://www.elastic.co/guide/en/elasticsearch/reference/index.html

This template is for Zabbix version: 5.0

Also available for: 7.2 7.0 6.4 6.2 6.0 5.4

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/elasticsearch_http?at=release/5.0

Template App Elasticsearch Cluster by HTTP

Overview

For Zabbix version: 5.0 and higher
The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.

This template was tested on:

Zabbix, version 5.0
Elasticsearch, version 6.5..7.6

Setup

See Zabbix template operation for basic instructions.

Zabbix configuration

No specific Zabbix configuration is required.

Macros used

Name	Description	Default
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}	Maximum of fetch latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}	Maximum of flush latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}	The maximum percent in the use of JVM heap for critically trigger expression.	`95`
{$ELASTICSEARCH.HEAP_USED.MAX.WARN}	The maximum percent in the use of JVM heap for warning trigger expression.	`85`
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}	Maximum of indexing latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.PASSWORD}	The password of the Elasticsearch.	``
{$ELASTICSEARCH.PORT}	The port of the Elasticsearch host.	`9200`
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}	Maximum of query latency in milliseconds for trigger expression.	`100`
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}	The ES cluster maximum response time in seconds for trigger expression.	`10s`
{$ELASTICSEARCH.SCHEME}	The scheme of the Elasticsearch (http/https).	`http`
{$ELASTICSEARCH.USERNAME}	The username of the Elasticsearch.	``

Template links

There are no template links in this template.

Discovery rules

Name Description Type Key and additional info

Cluster nodes discovery

Name	Description	Type	Key and additional info
Cluster nodes discovery	Discovery ES cluster nodes.	HTTP_AGENT	es.nodes.discovery Preprocessing: - JSONPATH: `$.nodes.[*]` - DISCARD_UNCHANGED_HEARTBEAT: `1d`

Discovery ES cluster nodes.

HTTP_AGENT

es.nodes.discovery

Preprocessing:

- JSONPATH: $.nodes.[*]

- DISCARD_UNCHANGED_HEARTBEAT: 1d

Items collected

Group	Name	Description	Type	Key and additional info
ES_cluster	ES: Service status	Checks if the service is running and accepting TCP connections.	SIMPLE	net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: `10m`
ES_cluster	ES: Service response time	Checks performance of the TCP service.	SIMPLE	net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"]
ES_cluster	ES: Cluster health status	Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	DEPENDENT	es.cluster.status Preprocessing: - JSONPATH: `$.status` - JAVASCRIPT: `The text is too long. Please see the template.` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES: Number of nodes	The number of nodes within the cluster.	DEPENDENT	es.cluster.number_of_nodes Preprocessing: - JSONPATH: `$.number_of_nodes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES: Number of data nodes	The number of nodes that are dedicated to data nodes.	DEPENDENT	es.cluster.number_of_data_nodes Preprocessing: - JSONPATH: `$.number_of_data_nodes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES: Number of relocating shards	The number of shards that are under relocation.	DEPENDENT	es.cluster.relocating_shards Preprocessing: - JSONPATH: `$.relocating_shards`
ES_cluster	ES: Number of initializing shards	The number of shards that are under initialization.	DEPENDENT	es.cluster.initializing_shards Preprocessing: - JSONPATH: `$.initializing_shards`
ES_cluster	ES: Number of unassigned shards	The number of shards that are not allocated.	DEPENDENT	es.cluster.unassigned_shards Preprocessing: - JSONPATH: `$.unassigned_shards`
ES_cluster	ES: Delayed unassigned shards	The number of shards whose allocation has been delayed by the timeout settings.	DEPENDENT	es.cluster.delayed_unassigned_shards Preprocessing: - JSONPATH: `$.delayed_unassigned_shards`
ES_cluster	ES: Number of pending tasks	The number of cluster-level changes that have not yet been executed.	DEPENDENT	es.cluster.number_of_pending_tasks Preprocessing: - JSONPATH: `$.number_of_pending_tasks`
ES_cluster	ES: Task max waiting in queue	The time expressed in seconds since the earliest initiated task is waiting for being performed.	DEPENDENT	es.cluster.task_max_waiting_in_queue Preprocessing: - JSONPATH: `$.task_max_waiting_in_queue_millis` - MULTIPLIER: `0.001`
ES_cluster	ES: Inactive shards percentage	The ratio of inactive shards in the cluster expressed as a percentage.	DEPENDENT	es.cluster.inactive_shards_percent_as_number Preprocessing: - JSONPATH: `$.active_shards_percent_as_number` - JAVASCRIPT: `return (100 - value)`
ES_cluster	ES: Cluster uptime	Uptime duration in seconds since JVM has last started.	DEPENDENT	es.nodes.jvm.max_uptime Preprocessing: - JSONPATH: `$.nodes.jvm.max_uptime_in_millis` - MULTIPLIER: `0.001`
ES_cluster	ES: Number of non-deleted documents	The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields.	DEPENDENT	es.indices.docs.count Preprocessing: - JSONPATH: `$.indices.docs.count` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES: Indices with shards assigned to nodes	The total number of indices with shards assigned to the selected nodes.	DEPENDENT	es.indices.count Preprocessing: - JSONPATH: `$.indices.count` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES: Total size of all file stores	The total size in bytes of all file stores across all selected nodes.	DEPENDENT	es.nodes.fs.total_in_bytes Preprocessing: - JSONPATH: `$.nodes.fs.total_in_bytes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES: Total available size to JVM in all file stores	The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use.	DEPENDENT	es.nodes.fs.available_in_bytes Preprocessing: - JSONPATH: `$.nodes.fs.available_in_bytes` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES: Nodes with the data role	The number of selected nodes with the data role.	DEPENDENT	es.nodes.count.data Preprocessing: - JSONPATH: `$.nodes.count.data` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES: Nodes with the ingest role	The number of selected nodes with the ingest role.	DEPENDENT	es.nodes.count.ingest Preprocessing: - JSONPATH: `$.nodes.count.ingest` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES: Nodes with the master role	The number of selected nodes with the master role.	DEPENDENT	es.nodes.count.master Preprocessing: - JSONPATH: `$.nodes.count.master` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES {#ES.NODE}: Total size	Total size (in bytes) of all file stores.	DEPENDENT	es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].fs.total.total_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
ES_cluster	ES {#ES.NODE}: Total available size	The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize.	DEPENDENT	es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].fs.total.available_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES {#ES.NODE}: Node uptime	JVM uptime in seconds.	DEPENDENT	es.node.jvm.uptime[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.uptime_in_millis.first()` - MULTIPLIER: `0.001`
ES_cluster	ES {#ES.NODE}: Maximum JVM memory available for use	The maximum amount of memory, in bytes, available for use by the heap.	DEPENDENT	es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_max_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1d`
ES_cluster	ES {#ES.NODE}: Amount of JVM heap currently in use	The memory, in bytes, currently in use by the heap.	DEPENDENT	es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES {#ES.NODE}: Percent of JVM heap currently in use	The percentage of memory currently in use by the heap.	DEPENDENT	es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_used_percent.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES {#ES.NODE}: Amount of JVM heap committed	The amount of memory, in bytes, available for use by the heap.	DEPENDENT	es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].jvm.mem.heap_committed_in_bytes.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES {#ES.NODE}: Number of open HTTP connections	The number of currently open HTTP connections for the node.	DEPENDENT	es.node.http.current_open[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].http.current_open.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES {#ES.NODE}: Rate of HTTP connections opened	The number of HTTP connections opened for the node per second.	DEPENDENT	es.node.http.opened.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].http.total_opened.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Time spent throttling operations	Time in seconds spent throttling operations for the last measuring span.	DEPENDENT	es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.throttle_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES_cluster	ES {#ES.NODE}: Time spent throttling recovery operations	Time in seconds spent throttling recovery operations for the last measuring span.	DEPENDENT	es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.recovery.throttle_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES_cluster	ES {#ES.NODE}: Time spent throttling merge operations	Time in seconds spent throttling merge operations for the last measuring span.	DEPENDENT	es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.merges.total_throttled_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES_cluster	ES {#ES.NODE}: Rate of queries	The number of query operations per second.	DEPENDENT	es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Time spent performing query	Time in seconds spent performing query operations for the last measuring span.	DEPENDENT	es.node.indices.search.query_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES_cluster	ES {#ES.NODE}: Query latency	The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals.	CALCULATED	es.node.indices.search.query_latency[{#ES.NODE}] Expression: `change(es.node.indices.search.query_time_in_millis[{#ES.NODE}]) / ( change(es.node.indices.search.query_total[{#ES.NODE}]) + (change(es.node.indices.search.query_total[{#ES.NODE}]) = 0) )`
ES_cluster	ES {#ES.NODE}: Current query operations	The number of query operations currently running.	DEPENDENT	es.node.indices.search.query_current[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_current.first()`
ES_cluster	ES {#ES.NODE}: Rate of fetch	The number of fetch operations per second.	DEPENDENT	es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Time spent performing fetch	Time in seconds spent performing fetch operations for the last measuring span.	DEPENDENT	es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
ES_cluster	ES {#ES.NODE}: Fetch latency	The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals.	CALCULATED	es.node.indices.search.fetch_latency[{#ES.NODE}] Expression: `change(es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( change(es.node.indices.search.fetch_total[{#ES.NODE}]) + (change(es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) )`
ES_cluster	ES {#ES.NODE}: Current fetch operations	The number of fetch operations currently running.	DEPENDENT	es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_current.first()`
ES_cluster	ES {#ES.NODE}: Write thread pool executor tasks completed	The number of tasks completed by the write thread pool executor.	DEPENDENT	es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.completed.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Write thread pool active threads	The number of active threads in the write thread pool.	DEPENDENT	es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.active.first()`
ES_cluster	ES {#ES.NODE}: Write thread pool tasks in queue	The number of tasks in queue for the write thread pool.	DEPENDENT	es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.queue.first()`
ES_cluster	ES {#ES.NODE}: Write thread pool executor tasks rejected	The number of tasks rejected by the write thread pool executor.	DEPENDENT	es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.write.rejected.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Search thread pool executor tasks completed	The number of tasks completed by the search thread pool executor.	DEPENDENT	es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.completed.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Search thread pool active threads	The number of active threads in the search thread pool.	DEPENDENT	es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.active.first()`
ES_cluster	ES {#ES.NODE}: Search thread pool tasks in queue	The number of tasks in queue for the search thread pool.	DEPENDENT	es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.queue.first()`
ES_cluster	ES {#ES.NODE}: Search thread pool executor tasks rejected	The number of tasks rejected by the search thread pool executor.	DEPENDENT	es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.search.rejected.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Refresh thread pool executor tasks completed	The number of tasks completed by the refresh thread pool executor.	DEPENDENT	es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.completed.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Refresh thread pool active threads	The number of active threads in the refresh thread pool.	DEPENDENT	es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.active.first()`
ES_cluster	ES {#ES.NODE}: Refresh thread pool tasks in queue	The number of tasks in queue for the refresh thread pool.	DEPENDENT	es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.queue.first()`
ES_cluster	ES {#ES.NODE}: Refresh thread pool executor tasks rejected	The number of tasks rejected by the refresh thread pool executor.	DEPENDENT	es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].thread_pool.refresh.rejected.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Indexing latency	The average indexing latency calculated from the available index_total and index_time_in_millis metrics.	CALCULATED	es.node.indices.indexing.index_latency[{#ES.NODE}] Expression: `change(es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( change(es.node.indices.indexing.index_total[{#ES.NODE}]) + (change(es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) )`
ES_cluster	ES {#ES.NODE}: Current indexing operations	The number of indexing operations currently running.	DEPENDENT	es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_current.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
ES_cluster	ES {#ES.NODE}: Flush latency	The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics.	CALCULATED	es.node.indices.flush.latency[{#ES.NODE}] Expression: `change(es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(es.node.indices.flush.total[{#ES.NODE}]) + (change(es.node.indices.flush.total[{#ES.NODE}]) = 0) )`
ES_cluster	ES {#ES.NODE}: Rate of index refreshes	The number of refresh operations per second.	DEPENDENT	es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.refresh.total.first()` - CHANGE_PER_SECOND
ES_cluster	ES {#ES.NODE}: Time spent performing refresh	Time in seconds spent performing refresh operations for the last measuring span.	DEPENDENT	es.node.indices.refresh.time[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.refresh.total_time_in_millis.first()` - MULTIPLIER: `0.001` - SIMPLE_CHANGE
Zabbix_raw_items	ES: Get cluster health	Returns the health status of a cluster.	HTTP_AGENT	es.cluster.get_health
Zabbix_raw_items	ES: Get cluster stats	Returns cluster statistics.	HTTP_AGENT	es.cluster.get_stats
Zabbix_raw_items	ES: Get nodes stats	Returns cluster nodes statistics.	HTTP_AGENT	es.nodes.get_stats
Zabbix_raw_items	ES {#ES.NODE}: Total number of query	The total number of query operations.	DEPENDENT	es.node.indices.search.query_total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix_raw_items	ES {#ES.NODE}: Total time spent performing query	Time in milliseconds spent performing query operations.	DEPENDENT	es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.query_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix_raw_items	ES {#ES.NODE}: Total number of fetch	The total number of fetch operations.	DEPENDENT	es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix_raw_items	ES {#ES.NODE}: Total time spent performing fetch	Time in milliseconds spent performing fetch operations.	DEPENDENT	es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.search.fetch_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix_raw_items	ES {#ES.NODE}: Total number of indexing	The total number of indexing operations.	DEPENDENT	es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix_raw_items	ES {#ES.NODE}: Total time spent performing indexing	Total time in milliseconds spent performing indexing operations.	DEPENDENT	es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.indexing.index_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix_raw_items	ES {#ES.NODE}: Total number of index flushes to disk	The total number of flush operations.	DEPENDENT	es.node.indices.flush.total[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.flush.total.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`
Zabbix_raw_items	ES {#ES.NODE}: Total time spent on flushing indices to disk	Total time in milliseconds spent performing flush operations.	DEPENDENT	es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: `$..[?(@.name=='{#ES.NODE}')].indices.flush.total_time_in_millis.first()` - DISCARD_UNCHANGED_HEARTBEAT: `1h`

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
ES: Service is down	The service is unavailable or does not accept TCP connections.	`{TEMPLATE_NAME:net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"].last()}=0`	AVERAGE	Manual close: YES
ES: Service response time is too high (over {$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} for 5m)	The performance of the TCP service is very low.	`{TEMPLATE_NAME:net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"].min(5m)}>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN}`	WARNING	Manual close: YES Depends on: - ES: Service is down
ES: Health is YELLOW	All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired.	`{TEMPLATE_NAME:es.cluster.status.last()}=1`	AVERAGE
ES: Health is RED	One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned.	`{TEMPLATE_NAME:es.cluster.status.last()}=2`	HIGH
ES: Health is UNKNOWN	The health status of the cluster is unknown or cannot be obtained.	`{TEMPLATE_NAME:es.cluster.status.last()}=255`	HIGH
ES: The number of nodes within the cluster has decreased		`{TEMPLATE_NAME:es.cluster.number_of_nodes.change()}<0`	INFO	Manual close: YES
ES: The number of nodes within the cluster has increased		`{TEMPLATE_NAME:es.cluster.number_of_nodes.change()}>0`	INFO	Manual close: YES
ES: Cluster has the initializing shards	The cluster has the initializing shards longer than 10 minutes.	`{TEMPLATE_NAME:es.cluster.initializing_shards.min(10m)}>0`	AVERAGE
ES: Cluster has the unassigned shards	The cluster has the unassigned shards longer than 10 minutes.	`{TEMPLATE_NAME:es.cluster.unassigned_shards.min(10m)}>0`	AVERAGE
ES: Cluster has been restarted (uptime < 10m)	Uptime is less than 10 minutes	`{TEMPLATE_NAME:es.nodes.jvm.max_uptime.last()}<10m`	INFO	Manual close: YES
ES: Cluster does not have enough space for resharding	There is not enough disk space for index resharding.	`({TEMPLATE_NAME:es.nodes.fs.total_in_bytes.last()}-{TEMPLATE_NAME:es.nodes.fs.available_in_bytes.last()})/({TEMPLATE_NAME:es.cluster.number_of_data_nodes.last()}-1)>{TEMPLATE_NAME:es.nodes.fs.available_in_bytes.last()}`	HIGH
ES: Cluster has only two master nodes	The cluster has only two nodes with a master role and will be unavailable if one of them breaks.	`{TEMPLATE_NAME:es.nodes.count.master.last()}=2`	DISASTER
ES {#ES.NODE}: Node {#ES.NODE} has been restarted (uptime < 10m)	Uptime is less than 10 minutes	`{TEMPLATE_NAME:es.node.jvm.uptime[{#ES.NODE}].last()}<10m`	INFO	Manual close: YES
ES {#ES.NODE}: Percent of JVM heap in use is high (over {$ELASTICSEARCH.HEAP_USED.MAX.WARN}% for 1h)	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`{TEMPLATE_NAME:es.node.jvm.mem.heap_used_percent[{#ES.NODE}].min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.WARN}`	WARNING	Depends on: - ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h)
ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h)	This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes.	`{TEMPLATE_NAME:es.node.jvm.mem.heap_used_percent[{#ES.NODE}].min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT}`	HIGH
ES {#ES.NODE}: Query latency is too high (over {$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}ms for 5m)	If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries.	`{TEMPLATE_NAME:es.node.indices.search.query_latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}`	WARNING
ES {#ES.NODE}: Fetch latency is too high (over {$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}ms for 5m)	The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results.	`{TEMPLATE_NAME:es.node.indices.search.fetch_latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}`	WARNING
ES {#ES.NODE}: Write thread pool executor has the rejected tasks (for 5m)	The number of tasks rejected by the write thread pool executor is over 0 for 5m.	`{TEMPLATE_NAME:es.node.thread_pool.write.rejected.rate[{#ES.NODE}].min(5m)}>0`	WARNING
ES {#ES.NODE}: Search thread pool executor has the rejected tasks (for 5m)	The number of tasks rejected by the search thread pool executor is over 0 for 5m.	`{TEMPLATE_NAME:es.node.thread_pool.search.rejected.rate[{#ES.NODE}].min(5m)}>0`	WARNING
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks (for 5m)	The number of tasks rejected by the refresh thread pool executor is over 0 for 5m.	`{TEMPLATE_NAME:es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}].min(5m)}>0`	WARNING
ES {#ES.NODE}: Indexing latency is too high (over {$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}ms for 5m)	If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there).	`{TEMPLATE_NAME:es.node.indices.indexing.index_latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}`	WARNING
ES {#ES.NODE}: Flush latency is too high (over {$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}ms for 5m)	If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index.	`{TEMPLATE_NAME:es.node.indices.flush.latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}`	WARNING

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.

References

https://www.elastic.co/guide/en/elasticsearch/reference/index.html

Link	Source	Compatibility	Type, Technology	Created Updated	Rating
Elasticsearch Cluster by HTTP zbx 4.2 This is fork of official Zabbix template Elasticsearch Cluster by HTTP (Zabbix 5.0)Features:Compatible with Zabbix 4.2 +Added trigger of Maximum allocated shards on ES clusterBe default in ES max shards per node is 1000 ({$ELASTICSEARCH.MAX_SHARDS_PER_NODE}) if number of shrads is maximum then elasticshearch ... template_elasticsearch_cluster_by_http_for_zabbix_4.2+	GitHub Community Templates	5.0+
App Elasticsearch Cluster new ElasticSearch Zabbix monitoringScript-free Zabbix ES monitoringThis template monitores all ES cluster using Zabbix 4.x HTTP Agent resource.This allows check ES being OnPremise or PAAS (AWS Elasticsearch, for example) without additional scripts.Requisites:ES available for Zabbix server or a Zabbix proxy. ... template_elasticsearch	GitHub Community Templates	5.0+
App Elasticsearch Cluster by Zabbix agent This is the "Zabbix agent" version of the HTTP template shipped with Zabbix 5.0 (https://www.zabbix.com/integrations/elasticsearch)This version can connect to elasticsearch on localohost or a remote network using the zabbix agent.I have added checking of read-only indices. Elasticsearch makes indices ... template_app_elasticsearch_cluster_by_zabbix_agent	GitHub Community Templates	5.0+
Zabbix agent extension for monitoring Elasticsearch zabbix-agent-extension-elasticsearch - this extension for monitoring Elasticsearch cluster and node health/status. github.com/zarplata/zabbix-agent-extension-elasticsearch	GitHub 35		Templates, Scripts Go	2017-06-30 10 m	Popular
zabbix监控Elasticsearch集群 github.com/Wasim37/zabbix-es [cn]	GitHub 14		Python	2017-01-05 3 y
github.com/scoopex/zabbix-agent-extensions	GitHub		Template Python
Powershell solution for monitoring Elasticsearch cluster status and reports the results to Zabbix github.com/jdoles/ElasticsearchClusterStatusZabbix	GitHub		Templates, Scripts Powershell	2018-05-08 6 y

See all Zabbix community templates

Zabbix + Elasticsearch

Elasticsearch

Available solutions

Elasticsearch Cluster by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule Cluster nodes discovery

Item prototypes for Cluster nodes discovery

Trigger prototypes for Cluster nodes discovery

Feedback

Elasticsearch Cluster by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule Cluster nodes discovery

Item prototypes for Cluster nodes discovery

Trigger prototypes for Cluster nodes discovery

Feedback

Elasticsearch Cluster by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule Cluster nodes discovery

Item prototypes for Cluster nodes discovery

Trigger prototypes for Cluster nodes discovery

Feedback

Elasticsearch Cluster by HTTP

Overview

Setup

Zabbix configuration

Macros used

Template links

Discovery rules

Items collected

Triggers

Feedback

References

Elasticsearch Cluster by HTTP

Overview

Requirements

Tested versions

Configuration

Setup

Macros used

Items

Triggers

LLD rule Cluster nodes discovery

Item prototypes for Cluster nodes discovery

Trigger prototypes for Cluster nodes discovery

Feedback

Elasticsearch Cluster by HTTP

Overview

Setup

Zabbix configuration

Macros used

Template links

Discovery rules

Items collected

Triggers

Feedback

References

Template App Elasticsearch Cluster by HTTP

Overview

Setup