Elasticsearch Cluster by HTTP
Overview
The template to monitor Elasticsearch by Zabbix that work without any external scripts.
It works with both standalone and cluster instances.
The metrics are collected in one pass remotely using an HTTP agent.
They are getting values from REST API _cluster/health
, _cluster/stats
, _nodes/stats
requests.
Requirements
Zabbix version: 7.2 and higher.
Tested versions
This template has been tested on:
- Elasticsearch 6.5, 7.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
-
Set the hostname or IP address of the Elasticsearch host in the
{$ELASTICSEARCH.HOST}
macro. -
Set the login and password in the
{$ELASTICSEARCH.USERNAME}
and{$ELASTICSEARCH.PASSWORD}
macros. -
If you use an atypical location of ES API, don't forget to change the macros
{$ELASTICSEARCH.SCHEME}
,{$ELASTICSEARCH.PORT}
.
Macros used
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
|
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
|
{$ELASTICSEARCH.HOST} | The hostname or IP address of the Elasticsearch host. |
<SET ELASTICSEARCH HOST> |
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Service status | Checks if the service is running and accepting TCP connections. |
Simple check | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing
|
Service response time | Checks performance of the TCP service. |
Simple check | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] |
Get cluster health | Returns the health status of a cluster. |
HTTP agent | es.cluster.get_health |
Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
Dependent item | es.cluster.status Preprocessing
|
Number of nodes | The number of nodes within the cluster. |
Dependent item | es.cluster.number_of_nodes Preprocessing
|
Number of data nodes | The number of nodes that are dedicated to data nodes. |
Dependent item | es.cluster.number_of_data_nodes Preprocessing
|
Number of relocating shards | The number of shards that are under relocation. |
Dependent item | es.cluster.relocating_shards Preprocessing
|
Number of initializing shards | The number of shards that are under initialization. |
Dependent item | es.cluster.initializing_shards Preprocessing
|
Number of unassigned shards | The number of shards that are not allocated. |
Dependent item | es.cluster.unassigned_shards Preprocessing
|
Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
Dependent item | es.cluster.delayed_unassigned_shards Preprocessing
|
Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
Dependent item | es.cluster.number_of_pending_tasks Preprocessing
|
Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
Dependent item | es.cluster.task_max_waiting_in_queue Preprocessing
|
Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
Dependent item | es.cluster.inactive_shards_percent_as_number Preprocessing
|
Get cluster stats | Returns cluster statistics. |
HTTP agent | es.cluster.get_stats |
Cluster uptime | Uptime duration in seconds since JVM has last started. |
Dependent item | es.nodes.jvm.max_uptime Preprocessing
|
Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
Dependent item | es.indices.docs.count Preprocessing
|
Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
Dependent item | es.indices.count Preprocessing
|
Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
Dependent item | es.nodes.fs.total_in_bytes Preprocessing
|
Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
Dependent item | es.nodes.fs.available_in_bytes Preprocessing
|
Nodes with the data role | The number of selected nodes with the data role. |
Dependent item | es.nodes.count.data Preprocessing
|
Nodes with the ingest role | The number of selected nodes with the ingest role. |
Dependent item | es.nodes.count.ingest Preprocessing
|
Nodes with the master role | The number of selected nodes with the master role. |
Dependent item | es.nodes.count.master Preprocessing
|
Get nodes stats | Returns cluster nodes statistics. |
HTTP agent | es.nodes.get_stats |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Elasticsearch: Service is down | The service is unavailable or does not accept TCP connections. |
last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0 |
Average | Manual close: Yes |
Elasticsearch: Service response time is too high | The performance of the TCP service is very low. |
min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |
Warning | Manual close: Yes Depends on:
|
Elasticsearch: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1 |
Average | |
Elasticsearch: Health is RED | One or more primary shards are unassigned, so some data is unavailable. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2 |
High | |
Elasticsearch: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255 |
High | |
Elasticsearch: The number of nodes within the cluster has decreased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0 |
Info | Manual close: Yes | |
Elasticsearch: The number of nodes within the cluster has increased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0 |
Info | Manual close: Yes | |
Elasticsearch: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0 |
Average | |
Elasticsearch: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0 |
Average | |
Elasticsearch: Cluster has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m |
Info | Manual close: Yes |
Elasticsearch: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes) |
High | |
Elasticsearch: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2 |
Disaster |
LLD rule Cluster nodes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP agent | es.nodes.discovery Preprocessing
|
Item prototypes for Cluster nodes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
ES {#ES.NODE}: Get data | Returns cluster nodes statistics. |
Dependent item | es.node.get.data[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
Dependent item | es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
Dependent item | es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
Dependent item | es.node.jvm.uptime[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
Dependent item | es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
Dependent item | es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
Dependent item | es.node.http.current_open[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
Dependent item | es.node.http.opened.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
Dependent item | es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
Dependent item | es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
Dependent item | es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
Dependent item | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of query | The total number of query operations. |
Dependent item | es.node.indices.search.query_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
Dependent item | es.node.indices.search.query_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
Dependent item | es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.query_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
Dependent item | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
Dependent item | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
Dependent item | es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
Dependent item | es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
Dependent item | es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.fetch_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
Dependent item | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
Dependent item | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
Dependent item | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
Dependent item | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
Dependent item | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
Dependent item | es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
Dependent item | es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available index_total and index_time_in_millis metrics. |
Calculated | es.node.indices.indexing.index_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
Dependent item | es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
Dependent item | es.node.indices.flush.total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
Dependent item | es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics. |
Calculated | es.node.indices.flush.latency[{#ES.NODE}] |
ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
Dependent item | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
Dependent item | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing
|
Trigger prototypes for Cluster nodes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Elasticsearch: ES {#ES.NODE}: has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m |
Info | Manual close: Yes |
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is high | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |
Warning | Depends on:
|
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is critical | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |
High | |
Elasticsearch: ES {#ES.NODE}: Query latency is too high | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |
Warning | |
Elasticsearch: ES {#ES.NODE}: Fetch latency is too high | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |
Warning | |
Elasticsearch: ES {#ES.NODE}: Write thread pool executor has the rejected tasks | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
Elasticsearch: ES {#ES.NODE}: Search thread pool executor has the rejected tasks | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
Elasticsearch: ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
Elasticsearch: ES {#ES.NODE}: Indexing latency is too high | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation |
min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |
Warning | |
Elasticsearch: ES {#ES.NODE}: Flush latency is too high | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate |
min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |
Warning |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Elasticsearch Cluster by HTTP
Overview
The template to monitor Elasticsearch by Zabbix that work without any external scripts.
It works with both standalone and cluster instances.
The metrics are collected in one pass remotely using an HTTP agent.
They are getting values from REST API _cluster/health
, _cluster/stats
, _nodes/stats
requests.
Requirements
Zabbix version: 7.0 and higher.
Tested versions
This template has been tested on:
- Elasticsearch 6.5, 7.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
-
Set the hostname or IP address of the Elasticsearch host in the
{$ELASTICSEARCH.HOST}
macro. -
Set the login and password in the
{$ELASTICSEARCH.USERNAME}
and{$ELASTICSEARCH.PASSWORD}
macros. -
If you use an atypical location of ES API, don't forget to change the macros
{$ELASTICSEARCH.SCHEME}
,{$ELASTICSEARCH.PORT}
.
Macros used
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
|
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
|
{$ELASTICSEARCH.HOST} | The hostname or IP address of the Elasticsearch host. |
<SET ELASTICSEARCH HOST> |
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Service status | Checks if the service is running and accepting TCP connections. |
Simple check | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing
|
Service response time | Checks performance of the TCP service. |
Simple check | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] |
Get cluster health | Returns the health status of a cluster. |
HTTP agent | es.cluster.get_health |
Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
Dependent item | es.cluster.status Preprocessing
|
Number of nodes | The number of nodes within the cluster. |
Dependent item | es.cluster.number_of_nodes Preprocessing
|
Number of data nodes | The number of nodes that are dedicated to data nodes. |
Dependent item | es.cluster.number_of_data_nodes Preprocessing
|
Number of relocating shards | The number of shards that are under relocation. |
Dependent item | es.cluster.relocating_shards Preprocessing
|
Number of initializing shards | The number of shards that are under initialization. |
Dependent item | es.cluster.initializing_shards Preprocessing
|
Number of unassigned shards | The number of shards that are not allocated. |
Dependent item | es.cluster.unassigned_shards Preprocessing
|
Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
Dependent item | es.cluster.delayed_unassigned_shards Preprocessing
|
Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
Dependent item | es.cluster.number_of_pending_tasks Preprocessing
|
Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
Dependent item | es.cluster.task_max_waiting_in_queue Preprocessing
|
Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
Dependent item | es.cluster.inactive_shards_percent_as_number Preprocessing
|
Get cluster stats | Returns cluster statistics. |
HTTP agent | es.cluster.get_stats |
Cluster uptime | Uptime duration in seconds since JVM has last started. |
Dependent item | es.nodes.jvm.max_uptime Preprocessing
|
Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
Dependent item | es.indices.docs.count Preprocessing
|
Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
Dependent item | es.indices.count Preprocessing
|
Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
Dependent item | es.nodes.fs.total_in_bytes Preprocessing
|
Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
Dependent item | es.nodes.fs.available_in_bytes Preprocessing
|
Nodes with the data role | The number of selected nodes with the data role. |
Dependent item | es.nodes.count.data Preprocessing
|
Nodes with the ingest role | The number of selected nodes with the ingest role. |
Dependent item | es.nodes.count.ingest Preprocessing
|
Nodes with the master role | The number of selected nodes with the master role. |
Dependent item | es.nodes.count.master Preprocessing
|
Get nodes stats | Returns cluster nodes statistics. |
HTTP agent | es.nodes.get_stats |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Elasticsearch: Service is down | The service is unavailable or does not accept TCP connections. |
last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0 |
Average | Manual close: Yes |
Elasticsearch: Service response time is too high | The performance of the TCP service is very low. |
min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |
Warning | Manual close: Yes Depends on:
|
Elasticsearch: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1 |
Average | |
Elasticsearch: Health is RED | One or more primary shards are unassigned, so some data is unavailable. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2 |
High | |
Elasticsearch: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255 |
High | |
Elasticsearch: The number of nodes within the cluster has decreased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0 |
Info | Manual close: Yes | |
Elasticsearch: The number of nodes within the cluster has increased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0 |
Info | Manual close: Yes | |
Elasticsearch: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0 |
Average | |
Elasticsearch: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0 |
Average | |
Elasticsearch: Cluster has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m |
Info | Manual close: Yes |
Elasticsearch: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes) |
High | |
Elasticsearch: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2 |
Disaster |
LLD rule Cluster nodes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP agent | es.nodes.discovery Preprocessing
|
Item prototypes for Cluster nodes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
ES {#ES.NODE}: Get data | Returns cluster nodes statistics. |
Dependent item | es.node.get.data[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
Dependent item | es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
Dependent item | es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
Dependent item | es.node.jvm.uptime[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
Dependent item | es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
Dependent item | es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
Dependent item | es.node.http.current_open[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
Dependent item | es.node.http.opened.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
Dependent item | es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
Dependent item | es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
Dependent item | es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
Dependent item | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of query | The total number of query operations. |
Dependent item | es.node.indices.search.query_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
Dependent item | es.node.indices.search.query_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
Dependent item | es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.query_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
Dependent item | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
Dependent item | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
Dependent item | es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
Dependent item | es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
Dependent item | es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.fetch_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
Dependent item | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
Dependent item | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
Dependent item | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
Dependent item | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
Dependent item | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
Dependent item | es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
Dependent item | es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available index_total and index_time_in_millis metrics. |
Calculated | es.node.indices.indexing.index_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
Dependent item | es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
Dependent item | es.node.indices.flush.total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
Dependent item | es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics. |
Calculated | es.node.indices.flush.latency[{#ES.NODE}] |
ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
Dependent item | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
Dependent item | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing
|
Trigger prototypes for Cluster nodes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Elasticsearch: ES {#ES.NODE}: has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m |
Info | Manual close: Yes |
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is high | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |
Warning | Depends on:
|
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is critical | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |
High | |
Elasticsearch: ES {#ES.NODE}: Query latency is too high | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |
Warning | |
Elasticsearch: ES {#ES.NODE}: Fetch latency is too high | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |
Warning | |
Elasticsearch: ES {#ES.NODE}: Write thread pool executor has the rejected tasks | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
Elasticsearch: ES {#ES.NODE}: Search thread pool executor has the rejected tasks | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
Elasticsearch: ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
Elasticsearch: ES {#ES.NODE}: Indexing latency is too high | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation |
min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |
Warning | |
Elasticsearch: ES {#ES.NODE}: Flush latency is too high | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate |
min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |
Warning |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Elasticsearch Cluster by HTTP
Overview
The template to monitor Elasticsearch by Zabbix that work without any external scripts.
It works with both standalone and cluster instances.
The metrics are collected in one pass remotely using an HTTP agent.
They are getting values from REST API _cluster/health
, _cluster/stats
, _nodes/stats
requests.
Requirements
Zabbix version: 6.4 and higher.
Tested versions
This template has been tested on:
- Elasticsearch 6.5, 7.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
-
Set the hostname or IP address of the Elasticsearch host in the
{$ELASTICSEARCH.HOST}
macro. -
Set the login and password in the
{$ELASTICSEARCH.USERNAME}
and{$ELASTICSEARCH.PASSWORD}
macros. -
If you use an atypical location of ES API, don't forget to change the macros
{$ELASTICSEARCH.SCHEME}
,{$ELASTICSEARCH.PORT}
.
Macros used
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
|
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
|
{$ELASTICSEARCH.HOST} | The hostname or IP address of the Elasticsearch host. |
<SET ELASTICSEARCH HOST> |
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
ES: Service status | Checks if the service is running and accepting TCP connections. |
Simple check | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing
|
ES: Service response time | Checks performance of the TCP service. |
Simple check | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] |
ES: Get cluster health | Returns the health status of a cluster. |
HTTP agent | es.cluster.get_health |
ES: Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
Dependent item | es.cluster.status Preprocessing
|
ES: Number of nodes | The number of nodes within the cluster. |
Dependent item | es.cluster.number_of_nodes Preprocessing
|
ES: Number of data nodes | The number of nodes that are dedicated to data nodes. |
Dependent item | es.cluster.number_of_data_nodes Preprocessing
|
ES: Number of relocating shards | The number of shards that are under relocation. |
Dependent item | es.cluster.relocating_shards Preprocessing
|
ES: Number of initializing shards | The number of shards that are under initialization. |
Dependent item | es.cluster.initializing_shards Preprocessing
|
ES: Number of unassigned shards | The number of shards that are not allocated. |
Dependent item | es.cluster.unassigned_shards Preprocessing
|
ES: Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
Dependent item | es.cluster.delayed_unassigned_shards Preprocessing
|
ES: Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
Dependent item | es.cluster.number_of_pending_tasks Preprocessing
|
ES: Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
Dependent item | es.cluster.task_max_waiting_in_queue Preprocessing
|
ES: Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
Dependent item | es.cluster.inactive_shards_percent_as_number Preprocessing
|
ES: Get cluster stats | Returns cluster statistics. |
HTTP agent | es.cluster.get_stats |
ES: Cluster uptime | Uptime duration in seconds since JVM has last started. |
Dependent item | es.nodes.jvm.max_uptime Preprocessing
|
ES: Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
Dependent item | es.indices.docs.count Preprocessing
|
ES: Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
Dependent item | es.indices.count Preprocessing
|
ES: Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
Dependent item | es.nodes.fs.total_in_bytes Preprocessing
|
ES: Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
Dependent item | es.nodes.fs.available_in_bytes Preprocessing
|
ES: Nodes with the data role | The number of selected nodes with the data role. |
Dependent item | es.nodes.count.data Preprocessing
|
ES: Nodes with the ingest role | The number of selected nodes with the ingest role. |
Dependent item | es.nodes.count.ingest Preprocessing
|
ES: Nodes with the master role | The number of selected nodes with the master role. |
Dependent item | es.nodes.count.master Preprocessing
|
ES: Get nodes stats | Returns cluster nodes statistics. |
HTTP agent | es.nodes.get_stats |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES: Service is down | The service is unavailable or does not accept TCP connections. |
last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0 |
Average | Manual close: Yes |
ES: Service response time is too high | The performance of the TCP service is very low. |
min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |
Warning | Manual close: Yes Depends on:
|
ES: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1 |
Average | |
ES: Health is RED | One or more primary shards are unassigned, so some data is unavailable. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2 |
High | |
ES: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255 |
High | |
ES: The number of nodes within the cluster has decreased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0 |
Info | Manual close: Yes | |
ES: The number of nodes within the cluster has increased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0 |
Info | Manual close: Yes | |
ES: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0 |
Average | |
ES: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0 |
Average | |
ES: Cluster has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m |
Info | Manual close: Yes |
ES: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes) |
High | |
ES: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2 |
Disaster |
LLD rule Cluster nodes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP agent | es.nodes.discovery Preprocessing
|
Item prototypes for Cluster nodes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
ES {#ES.NODE}: Get data | Returns cluster nodes statistics. |
Dependent item | es.node.get.data[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
Dependent item | es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
Dependent item | es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
Dependent item | es.node.jvm.uptime[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
Dependent item | es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
Dependent item | es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
Dependent item | es.node.http.current_open[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
Dependent item | es.node.http.opened.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
Dependent item | es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
Dependent item | es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
Dependent item | es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
Dependent item | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of query | The total number of query operations. |
Dependent item | es.node.indices.search.query_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
Dependent item | es.node.indices.search.query_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
Dependent item | es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.query_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
Dependent item | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
Dependent item | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
Dependent item | es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
Dependent item | es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
Dependent item | es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.fetch_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
Dependent item | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
Dependent item | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
Dependent item | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
Dependent item | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
Dependent item | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
Dependent item | es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
Dependent item | es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available index_total and index_time_in_millis metrics. |
Calculated | es.node.indices.indexing.index_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
Dependent item | es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
Dependent item | es.node.indices.flush.total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
Dependent item | es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics. |
Calculated | es.node.indices.flush.latency[{#ES.NODE}] |
ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
Dependent item | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
Dependent item | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing
|
Trigger prototypes for Cluster nodes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES {#ES.NODE}: has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m |
Info | Manual close: Yes |
ES {#ES.NODE}: Percent of JVM heap in use is high | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |
Warning | Depends on:
|
ES {#ES.NODE}: Percent of JVM heap in use is critical | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |
High | |
ES {#ES.NODE}: Query latency is too high | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |
Warning | |
ES {#ES.NODE}: Fetch latency is too high | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |
Warning | |
ES {#ES.NODE}: Write thread pool executor has the rejected tasks | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
ES {#ES.NODE}: Search thread pool executor has the rejected tasks | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
ES {#ES.NODE}: Indexing latency is too high | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation |
min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |
Warning | |
ES {#ES.NODE}: Flush latency is too high | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate |
min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |
Warning |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Elasticsearch Cluster by HTTP
Overview
For Zabbix version: 6.2 and higher
The template to monitor Elasticsearch by Zabbix that work without any external scripts.
It works with both standalone and cluster instances.
The metrics are collected in one pass remotely using an HTTP agent.
They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.
This template was tested on:
- Elasticsearch, version 6.5..7.6
Setup
See Zabbix template operation for basic instructions.
You can set {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template for using on the host level. If you use an atypical location ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.
Zabbix configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
`` |
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
`` |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP_AGENT | es.nodes.discovery Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
ES cluster | ES: Service status | Checks if the service is running and accepting TCP connections. |
SIMPLE | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Service response time | Checks performance of the TCP service. |
SIMPLE | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] |
ES cluster | ES: Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
DEPENDENT | es.cluster.status Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Number of nodes | The number of nodes within the cluster. |
DEPENDENT | es.cluster.number_of_nodes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Number of data nodes | The number of nodes that are dedicated to data nodes. |
DEPENDENT | es.cluster.number_of_data_nodes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Number of relocating shards | The number of shards that are under relocation. |
DEPENDENT | es.cluster.relocating_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Number of initializing shards | The number of shards that are under initialization. |
DEPENDENT | es.cluster.initializing_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Number of unassigned shards | The number of shards that are not allocated. |
DEPENDENT | es.cluster.unassigned_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
DEPENDENT | es.cluster.delayed_unassigned_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
DEPENDENT | es.cluster.number_of_pending_tasks Preprocessing: - JSONPATH: |
ES cluster | ES: Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
DEPENDENT | es.cluster.task_max_waiting_in_queue Preprocessing: - JSONPATH: - MULTIPLIER: |
ES cluster | ES: Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
DEPENDENT | es.cluster.inactive_shards_percent_as_number Preprocessing: - JSONPATH: - JAVASCRIPT: |
ES cluster | ES: Cluster uptime | Uptime duration in seconds since JVM has last started. |
DEPENDENT | es.nodes.jvm.max_uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
ES cluster | ES: Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
DEPENDENT | es.indices.docs.count Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
DEPENDENT | es.indices.count Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
DEPENDENT | es.nodes.fs.total_in_bytes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
DEPENDENT | es.nodes.fs.available_in_bytes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Nodes with the data role | The number of selected nodes with the data role. |
DEPENDENT | es.nodes.count.data Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Nodes with the ingest role | The number of selected nodes with the ingest role. |
DEPENDENT | es.nodes.count.ingest Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Nodes with the master role | The number of selected nodes with the master role. |
DEPENDENT | es.nodes.count.master Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
DEPENDENT | es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
DEPENDENT | es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
DEPENDENT | es.node.jvm.uptime[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: |
ES cluster | ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
DEPENDENT | es.node.http.current_open[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
DEPENDENT | es.node.http.opened.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
DEPENDENT | es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
DEPENDENT | es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
DEPENDENT | es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
DEPENDENT | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
DEPENDENT | es.node.indices.search.query_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
CALCULATED | es.node.indices.search.query_latency[{#ES.NODE}] Expression: change(//es.node.indices.search.query_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.query_total[{#ES.NODE}]) + (change(//es.node.indices.search.query_total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
DEPENDENT | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
DEPENDENT | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
DEPENDENT | es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
CALCULATED | es.node.indices.search.fetch_latency[{#ES.NODE}] Expression: change(//es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.fetch_total[{#ES.NODE}]) + (change(//es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
DEPENDENT | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
DEPENDENT | es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
DEPENDENT | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
DEPENDENT | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
DEPENDENT | es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
DEPENDENT | es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
DEPENDENT | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
DEPENDENT | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
DEPENDENT | es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
DEPENDENT | es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
DEPENDENT | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
DEPENDENT | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
DEPENDENT | es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available index_total and index_time_in_millis metrics. |
CALCULATED | es.node.indices.indexing.index_latency[{#ES.NODE}] Expression: change(//es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.indexing.index_total[{#ES.NODE}]) + (change(//es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
DEPENDENT | es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics. |
CALCULATED | es.node.indices.flush.latency[{#ES.NODE}] Expression: change(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.flush.total[{#ES.NODE}]) + (change(//es.node.indices.flush.total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
DEPENDENT | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
DEPENDENT | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
Zabbix raw items | ES: Get cluster health | Returns the health status of a cluster. |
HTTP_AGENT | es.cluster.get_health |
Zabbix raw items | ES: Get cluster stats | Returns cluster statistics. |
HTTP_AGENT | es.cluster.get_stats |
Zabbix raw items | ES: Get nodes stats | Returns cluster nodes statistics. |
HTTP_AGENT | es.nodes.get_stats |
Zabbix raw items | ES {#ES.NODE}: Total number of query | The total number of query operations. |
DEPENDENT | es.node.indices.search.query_total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
DEPENDENT | es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
DEPENDENT | es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
DEPENDENT | es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
DEPENDENT | es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
DEPENDENT | es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
DEPENDENT | es.node.indices.flush.total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
DEPENDENT | es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES: Service is down | The service is unavailable or does not accept TCP connections. |
last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"])=0 |
AVERAGE | Manual close: YES |
ES: Service response time is too high | The performance of the TCP service is very low. |
min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - ES: Service is down |
ES: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1 |
AVERAGE | |
ES: Health is RED | One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2 |
HIGH | |
ES: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255 |
HIGH | |
ES: The number of nodes within the cluster has decreased | - |
change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0 |
INFO | Manual close: YES |
ES: The number of nodes within the cluster has increased | - |
change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0 |
INFO | Manual close: YES |
ES: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0 |
AVERAGE | |
ES: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0 |
AVERAGE | |
ES: Cluster has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m |
INFO | Manual close: YES |
ES: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes) |
HIGH | |
ES: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2 |
DISASTER | |
ES {#ES.NODE}: has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m |
INFO | Manual close: YES |
ES {#ES.NODE}: Percent of JVM heap in use is high | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |
WARNING | Depends on: - ES {#ES.NODE}: Percent of JVM heap in use is critical |
ES {#ES.NODE}: Percent of JVM heap in use is critical | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |
HIGH | |
ES {#ES.NODE}: Query latency is too high | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Fetch latency is too high | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Write thread pool executor has the rejected tasks | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0 |
WARNING | |
ES {#ES.NODE}: Search thread pool executor has the rejected tasks | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0 |
WARNING | |
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0 |
WARNING | |
ES {#ES.NODE}: Indexing latency is too high | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there). |
min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Flush latency is too high | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |
WARNING |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
References
https://www.elastic.co/guide/en/elasticsearch/reference/index.html
Elasticsearch Cluster by HTTP
Overview
The template to monitor Elasticsearch by Zabbix that work without any external scripts. It works with both standalone and cluster instances. The metrics are collected in one pass remotely using an HTTP agent. They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.
Requirements
Zabbix version: 6.0 and higher.
Tested versions
This template has been tested on:
- Elasticsearch 6.5, 7.6
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
You can set {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template for using on the host level. If you use an atypical location ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.
Macros used
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
|
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
|
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
ES: Service status | Checks if the service is running and accepting TCP connections. |
Simple check | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] Preprocessing
|
ES: Service response time | Checks performance of the TCP service. |
Simple check | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] |
ES: Get cluster health | Returns the health status of a cluster. |
HTTP agent | es.cluster.get_health |
ES: Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
Dependent item | es.cluster.status Preprocessing
|
ES: Number of nodes | The number of nodes within the cluster. |
Dependent item | es.cluster.number_of_nodes Preprocessing
|
ES: Number of data nodes | The number of nodes that are dedicated to data nodes. |
Dependent item | es.cluster.number_of_data_nodes Preprocessing
|
ES: Number of relocating shards | The number of shards that are under relocation. |
Dependent item | es.cluster.relocating_shards Preprocessing
|
ES: Number of initializing shards | The number of shards that are under initialization. |
Dependent item | es.cluster.initializing_shards Preprocessing
|
ES: Number of unassigned shards | The number of shards that are not allocated. |
Dependent item | es.cluster.unassigned_shards Preprocessing
|
ES: Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
Dependent item | es.cluster.delayed_unassigned_shards Preprocessing
|
ES: Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
Dependent item | es.cluster.number_of_pending_tasks Preprocessing
|
ES: Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
Dependent item | es.cluster.task_max_waiting_in_queue Preprocessing
|
ES: Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
Dependent item | es.cluster.inactive_shards_percent_as_number Preprocessing
|
ES: Get cluster stats | Returns cluster statistics. |
HTTP agent | es.cluster.get_stats |
ES: Cluster uptime | Uptime duration in seconds since JVM has last started. |
Dependent item | es.nodes.jvm.max_uptime Preprocessing
|
ES: Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
Dependent item | es.indices.docs.count Preprocessing
|
ES: Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
Dependent item | es.indices.count Preprocessing
|
ES: Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
Dependent item | es.nodes.fs.total_in_bytes Preprocessing
|
ES: Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
Dependent item | es.nodes.fs.available_in_bytes Preprocessing
|
ES: Nodes with the data role | The number of selected nodes with the data role. |
Dependent item | es.nodes.count.data Preprocessing
|
ES: Nodes with the ingest role | The number of selected nodes with the ingest role. |
Dependent item | es.nodes.count.ingest Preprocessing
|
ES: Nodes with the master role | The number of selected nodes with the master role. |
Dependent item | es.nodes.count.master Preprocessing
|
ES: Get nodes stats | Returns cluster nodes statistics. |
HTTP agent | es.nodes.get_stats |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES: Service is down | The service is unavailable or does not accept TCP connections. |
last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"])=0 |
Average | Manual close: Yes |
ES: Service response time is too high | The performance of the TCP service is very low. |
min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |
Warning | Manual close: Yes Depends on:
|
ES: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1 |
Average | |
ES: Health is RED | One or more primary shards are unassigned, so some data is unavailable. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2 |
High | |
ES: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255 |
High | |
ES: The number of nodes within the cluster has decreased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0 |
Info | Manual close: Yes | |
ES: The number of nodes within the cluster has increased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0 |
Info | Manual close: Yes | |
ES: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0 |
Average | |
ES: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0 |
Average | |
ES: Cluster has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m |
Info | Manual close: Yes |
ES: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes) |
High | |
ES: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2 |
Disaster |
LLD rule Cluster nodes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP agent | es.nodes.discovery Preprocessing
|
Item prototypes for Cluster nodes discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
ES {#ES.NODE}: Get data | Returns cluster nodes statistics. |
Dependent item | es.node.get.data[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
Dependent item | es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
Dependent item | es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
Dependent item | es.node.jvm.uptime[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
Dependent item | es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
Dependent item | es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
Dependent item | es.node.http.current_open[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
Dependent item | es.node.http.opened.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
Dependent item | es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
Dependent item | es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
Dependent item | es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
Dependent item | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of query | The total number of query operations. |
Dependent item | es.node.indices.search.query_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
Dependent item | es.node.indices.search.query_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
Dependent item | es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.query_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
Dependent item | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
Dependent item | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
Dependent item | es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
Dependent item | es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
Dependent item | es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.fetch_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
Dependent item | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
Dependent item | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
Dependent item | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
Dependent item | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
Dependent item | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
Dependent item | es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
Dependent item | es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available index_total and index_time_in_millis metrics. |
Calculated | es.node.indices.indexing.index_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
Dependent item | es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
Dependent item | es.node.indices.flush.total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
Dependent item | es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics. |
Calculated | es.node.indices.flush.latency[{#ES.NODE}] |
ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
Dependent item | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
Dependent item | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing
|
Trigger prototypes for Cluster nodes discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES {#ES.NODE}: has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m |
Info | Manual close: Yes |
ES {#ES.NODE}: Percent of JVM heap in use is high | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |
Warning | Depends on:
|
ES {#ES.NODE}: Percent of JVM heap in use is critical | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |
High | |
ES {#ES.NODE}: Query latency is too high | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |
Warning | |
ES {#ES.NODE}: Fetch latency is too high | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |
Warning | |
ES {#ES.NODE}: Write thread pool executor has the rejected tasks | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
ES {#ES.NODE}: Search thread pool executor has the rejected tasks | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0 |
Warning | |
ES {#ES.NODE}: Indexing latency is too high | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation |
min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |
Warning | |
ES {#ES.NODE}: Flush latency is too high | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate |
min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |
Warning |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Elasticsearch Cluster by HTTP
Overview
For Zabbix version: 5.4 and higher
The template to monitor Elasticsearch by Zabbix that work without any external scripts.
It works with both standalone and cluster instances.
The metrics are collected in one pass remotely using an HTTP agent.
They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.
This template was tested on:
- Elasticsearch, version 6.5..7.6
Setup
See Zabbix template operation for basic instructions.
You can set {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template for using on the host level. If you use an atypical location ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.
Zabbix configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
`` |
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
`` |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP_AGENT | es.nodes.discovery Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
ES cluster | ES: Service status | Checks if the service is running and accepting TCP connections. |
SIMPLE | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Service response time | Checks performance of the TCP service. |
SIMPLE | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] |
ES cluster | ES: Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
DEPENDENT | es.cluster.status Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Number of nodes | The number of nodes within the cluster. |
DEPENDENT | es.cluster.number_of_nodes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Number of data nodes | The number of nodes that are dedicated to data nodes. |
DEPENDENT | es.cluster.number_of_data_nodes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Number of relocating shards | The number of shards that are under relocation. |
DEPENDENT | es.cluster.relocating_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Number of initializing shards | The number of shards that are under initialization. |
DEPENDENT | es.cluster.initializing_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Number of unassigned shards | The number of shards that are not allocated. |
DEPENDENT | es.cluster.unassigned_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
DEPENDENT | es.cluster.delayed_unassigned_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
DEPENDENT | es.cluster.number_of_pending_tasks Preprocessing: - JSONPATH: |
ES cluster | ES: Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
DEPENDENT | es.cluster.task_max_waiting_in_queue Preprocessing: - JSONPATH: - MULTIPLIER: |
ES cluster | ES: Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
DEPENDENT | es.cluster.inactive_shards_percent_as_number Preprocessing: - JSONPATH: - JAVASCRIPT: |
ES cluster | ES: Cluster uptime | Uptime duration in seconds since JVM has last started. |
DEPENDENT | es.nodes.jvm.max_uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
ES cluster | ES: Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
DEPENDENT | es.indices.docs.count Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
DEPENDENT | es.indices.count Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
DEPENDENT | es.nodes.fs.total_in_bytes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
DEPENDENT | es.nodes.fs.available_in_bytes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Nodes with the data role | The number of selected nodes with the data role. |
DEPENDENT | es.nodes.count.data Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Nodes with the ingest role | The number of selected nodes with the ingest role. |
DEPENDENT | es.nodes.count.ingest Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES: Nodes with the master role | The number of selected nodes with the master role. |
DEPENDENT | es.nodes.count.master Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
DEPENDENT | es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
DEPENDENT | es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
DEPENDENT | es.node.jvm.uptime[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: |
ES cluster | ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
DEPENDENT | es.node.http.current_open[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
DEPENDENT | es.node.http.opened.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
DEPENDENT | es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
DEPENDENT | es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
DEPENDENT | es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
DEPENDENT | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
DEPENDENT | es.node.indices.search.query_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
CALCULATED | es.node.indices.search.query_latency[{#ES.NODE}] Expression: change(//es.node.indices.search.query_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.query_total[{#ES.NODE}]) + (change(//es.node.indices.search.query_total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
DEPENDENT | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
DEPENDENT | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
DEPENDENT | es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
CALCULATED | es.node.indices.search.fetch_latency[{#ES.NODE}] Expression: change(//es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.fetch_total[{#ES.NODE}]) + (change(//es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
DEPENDENT | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
DEPENDENT | es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
DEPENDENT | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
DEPENDENT | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
DEPENDENT | es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
DEPENDENT | es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
DEPENDENT | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
DEPENDENT | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
DEPENDENT | es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
DEPENDENT | es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
DEPENDENT | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
DEPENDENT | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
DEPENDENT | es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available index_total and index_time_in_millis metrics. |
CALCULATED | es.node.indices.indexing.index_latency[{#ES.NODE}] Expression: change(//es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.indexing.index_total[{#ES.NODE}]) + (change(//es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
DEPENDENT | es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES cluster | ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics. |
CALCULATED | es.node.indices.flush.latency[{#ES.NODE}] Expression: change(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.flush.total[{#ES.NODE}]) + (change(//es.node.indices.flush.total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
DEPENDENT | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES cluster | ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
DEPENDENT | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
Zabbix raw items | ES: Get cluster health | Returns the health status of a cluster. |
HTTP_AGENT | es.cluster.get_health |
Zabbix raw items | ES: Get cluster stats | Returns cluster statistics. |
HTTP_AGENT | es.cluster.get_stats |
Zabbix raw items | ES: Get nodes stats | Returns cluster nodes statistics. |
HTTP_AGENT | es.nodes.get_stats |
Zabbix raw items | ES {#ES.NODE}: Total number of query | The total number of query operations. |
DEPENDENT | es.node.indices.search.query_total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
DEPENDENT | es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
DEPENDENT | es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
DEPENDENT | es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
DEPENDENT | es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
DEPENDENT | es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
DEPENDENT | es.node.indices.flush.total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
DEPENDENT | es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES: Service is down | The service is unavailable or does not accept TCP connections. |
last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"])=0 |
AVERAGE | Manual close: YES |
ES: Service response time is too high (over {$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} for 5m) | The performance of the TCP service is very low. |
min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - ES: Service is down |
ES: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1 |
AVERAGE | |
ES: Health is RED | One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2 |
HIGH | |
ES: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255 |
HIGH | |
ES: The number of nodes within the cluster has decreased | - |
change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0 |
INFO | Manual close: YES |
ES: The number of nodes within the cluster has increased | - |
change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0 |
INFO | Manual close: YES |
ES: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0 |
AVERAGE | |
ES: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0 |
AVERAGE | |
ES: Cluster has been restarted (uptime < 10m) | Uptime is less than 10 minutes |
last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m |
INFO | Manual close: YES |
ES: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes) |
HIGH | |
ES: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2 |
DISASTER | |
ES {#ES.NODE}: Node {#ES.NODE} has been restarted (uptime < 10m) | Uptime is less than 10 minutes |
last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m |
INFO | Manual close: YES |
ES {#ES.NODE}: Percent of JVM heap in use is high (over {$ELASTICSEARCH.HEAP_USED.MAX.WARN}% for 1h) | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |
WARNING | Depends on: - ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h) |
ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h) | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |
HIGH | |
ES {#ES.NODE}: Query latency is too high (over {$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}ms for 5m) | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Fetch latency is too high (over {$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}ms for 5m) | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Write thread pool executor has the rejected tasks (for 5m) | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0 |
WARNING | |
ES {#ES.NODE}: Search thread pool executor has the rejected tasks (for 5m) | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0 |
WARNING | |
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks (for 5m) | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0 |
WARNING | |
ES {#ES.NODE}: Indexing latency is too high (over {$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}ms for 5m) | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there). |
min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Flush latency is too high (over {$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}ms for 5m) | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |
WARNING |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
References
https://www.elastic.co/guide/en/elasticsearch/reference/index.html
Template App Elasticsearch Cluster by HTTP
Overview
For Zabbix version: 5.0 and higher
The template to monitor Elasticsearch by Zabbix that work without any external scripts.
It works with both standalone and cluster instances.
The metrics are collected in one pass remotely using an HTTP agent.
They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.
This template was tested on:
- Zabbix, version 5.0
- Elasticsearch, version 6.5..7.6
Setup
See Zabbix template operation for basic instructions.
You can set {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template for using on the host level. If you use an atypical location ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.
Zabbix configuration
No specific Zabbix configuration is required.
Macros used
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
`` |
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
`` |
Template links
There are no template links in this template.
Discovery rules
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP_AGENT | es.nodes.discovery Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Items collected
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
ES_cluster | ES: Service status | Checks if the service is running and accepting TCP connections. |
SIMPLE | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] Preprocessing: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Service response time | Checks performance of the TCP service. |
SIMPLE | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] |
ES_cluster | ES: Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
DEPENDENT | es.cluster.status Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Number of nodes | The number of nodes within the cluster. |
DEPENDENT | es.cluster.number_of_nodes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Number of data nodes | The number of nodes that are dedicated to data nodes. |
DEPENDENT | es.cluster.number_of_data_nodes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Number of relocating shards | The number of shards that are under relocation. |
DEPENDENT | es.cluster.relocating_shards Preprocessing: - JSONPATH: |
ES_cluster | ES: Number of initializing shards | The number of shards that are under initialization. |
DEPENDENT | es.cluster.initializing_shards Preprocessing: - JSONPATH: |
ES_cluster | ES: Number of unassigned shards | The number of shards that are not allocated. |
DEPENDENT | es.cluster.unassigned_shards Preprocessing: - JSONPATH: |
ES_cluster | ES: Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
DEPENDENT | es.cluster.delayed_unassigned_shards Preprocessing: - JSONPATH: |
ES_cluster | ES: Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
DEPENDENT | es.cluster.number_of_pending_tasks Preprocessing: - JSONPATH: |
ES_cluster | ES: Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
DEPENDENT | es.cluster.task_max_waiting_in_queue Preprocessing: - JSONPATH: - MULTIPLIER: |
ES_cluster | ES: Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
DEPENDENT | es.cluster.inactive_shards_percent_as_number Preprocessing: - JSONPATH: - JAVASCRIPT: |
ES_cluster | ES: Cluster uptime | Uptime duration in seconds since JVM has last started. |
DEPENDENT | es.nodes.jvm.max_uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
ES_cluster | ES: Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
DEPENDENT | es.indices.docs.count Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
DEPENDENT | es.indices.count Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
DEPENDENT | es.nodes.fs.total_in_bytes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.free_in_byes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
DEPENDENT | es.nodes.fs.available_in_bytes Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Nodes with the data role | The number of selected nodes with the data role. |
DEPENDENT | es.nodes.count.data Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Nodes with the ingest role | The number of selected nodes with the ingest role. |
DEPENDENT | es.nodes.count.ingest Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES: Nodes with the master role | The number of selected nodes with the master role. |
DEPENDENT | es.nodes.count.master Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
DEPENDENT | es.node.fs.total.total_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.free_in_bytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
DEPENDENT | es.node.fs.total.available_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
DEPENDENT | es.node.jvm.uptime[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: |
ES_cluster | ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_max_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_used_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_used_percent[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
DEPENDENT | es.node.jvm.mem.heap_committed_in_bytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
DEPENDENT | es.node.http.current_open[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
DEPENDENT | es.node.http.opened.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
DEPENDENT | es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES_cluster | ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
DEPENDENT | es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES_cluster | ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
DEPENDENT | es.node.indices.merges.total_throttled_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES_cluster | ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
DEPENDENT | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
DEPENDENT | es.node.indices.search.query_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES_cluster | ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
CALCULATED | es.node.indices.search.query_latency[{#ES.NODE}] Expression: change(es.node.indices.search.query_time_in_millis[{#ES.NODE}]) / ( change(es.node.indices.search.query_total[{#ES.NODE}]) + (change(es.node.indices.search.query_total[{#ES.NODE}]) = 0) ) |
ES_cluster | ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
DEPENDENT | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES_cluster | ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
DEPENDENT | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
DEPENDENT | es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES_cluster | ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
CALCULATED | es.node.indices.search.fetch_latency[{#ES.NODE}] Expression: change(es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( change(es.node.indices.search.fetch_total[{#ES.NODE}]) + (change(es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) ) |
ES_cluster | ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
DEPENDENT | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES_cluster | ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
DEPENDENT | es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
DEPENDENT | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES_cluster | ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
DEPENDENT | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES_cluster | ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
DEPENDENT | es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
DEPENDENT | es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
DEPENDENT | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES_cluster | ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
DEPENDENT | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES_cluster | ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
DEPENDENT | es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
DEPENDENT | es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
DEPENDENT | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES_cluster | ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
DEPENDENT | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES_cluster | ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
DEPENDENT | es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available index_total and index_time_in_millis metrics. |
CALCULATED | es.node.indices.indexing.index_latency[{#ES.NODE}] Expression: change(es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( change(es.node.indices.indexing.index_total[{#ES.NODE}]) + (change(es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) ) |
ES_cluster | ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
DEPENDENT | es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
ES_cluster | ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.total_time_in_millis metrics. |
CALCULATED | es.node.indices.flush.latency[{#ES.NODE}] Expression: change(es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(es.node.indices.flush.total[{#ES.NODE}]) + (change(es.node.indices.flush.total[{#ES.NODE}]) = 0) ) |
ES_cluster | ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
DEPENDENT | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE_PER_SECOND |
ES_cluster | ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
DEPENDENT | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
Zabbix_raw_items | ES: Get cluster health | Returns the health status of a cluster. |
HTTP_AGENT | es.cluster.get_health |
Zabbix_raw_items | ES: Get cluster stats | Returns cluster statistics. |
HTTP_AGENT | es.cluster.get_stats |
Zabbix_raw_items | ES: Get nodes stats | Returns cluster nodes statistics. |
HTTP_AGENT | es.nodes.get_stats |
Zabbix_raw_items | ES {#ES.NODE}: Total number of query | The total number of query operations. |
DEPENDENT | es.node.indices.search.query_total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix_raw_items | ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
DEPENDENT | es.node.indices.search.query_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix_raw_items | ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
DEPENDENT | es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix_raw_items | ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
DEPENDENT | es.node.indices.search.fetch_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix_raw_items | ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
DEPENDENT | es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix_raw_items | ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
DEPENDENT | es.node.indices.indexing.index_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix_raw_items | ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
DEPENDENT | es.node.indices.flush.total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Zabbix_raw_items | ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
DEPENDENT | es.node.indices.flush.total_time_in_millis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD_UNCHANGED_HEARTBEAT: |
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES: Service is down | The service is unavailable or does not accept TCP connections. |
{TEMPLATE_NAME:net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"].last()}=0 |
AVERAGE | Manual close: YES |
ES: Service response time is too high (over {$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} for 5m) | The performance of the TCP service is very low. |
{TEMPLATE_NAME:net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"].min(5m)}>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - ES: Service is down |
ES: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. |
{TEMPLATE_NAME:es.cluster.status.last()}=1 |
AVERAGE | |
ES: Health is RED | One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
{TEMPLATE_NAME:es.cluster.status.last()}=2 |
HIGH | |
ES: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
{TEMPLATE_NAME:es.cluster.status.last()}=255 |
HIGH | |
ES: The number of nodes within the cluster has decreased | {TEMPLATE_NAME:es.cluster.number_of_nodes.change()}<0 |
INFO | Manual close: YES |
|
ES: The number of nodes within the cluster has increased | {TEMPLATE_NAME:es.cluster.number_of_nodes.change()}>0 |
INFO | Manual close: YES |
|
ES: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
{TEMPLATE_NAME:es.cluster.initializing_shards.min(10m)}>0 |
AVERAGE | |
ES: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
{TEMPLATE_NAME:es.cluster.unassigned_shards.min(10m)}>0 |
AVERAGE | |
ES: Cluster has been restarted (uptime < 10m) | Uptime is less than 10 minutes |
{TEMPLATE_NAME:es.nodes.jvm.max_uptime.last()}<10m |
INFO | Manual close: YES |
ES: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
({TEMPLATE_NAME:es.nodes.fs.total_in_bytes.last()}-{TEMPLATE_NAME:es.nodes.fs.available_in_bytes.last()})/({TEMPLATE_NAME:es.cluster.number_of_data_nodes.last()}-1)>{TEMPLATE_NAME:es.nodes.fs.available_in_bytes.last()} |
HIGH | |
ES: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
{TEMPLATE_NAME:es.nodes.count.master.last()}=2 |
DISASTER | |
ES {#ES.NODE}: Node {#ES.NODE} has been restarted (uptime < 10m) | Uptime is less than 10 minutes |
{TEMPLATE_NAME:es.node.jvm.uptime[{#ES.NODE}].last()}<10m |
INFO | Manual close: YES |
ES {#ES.NODE}: Percent of JVM heap in use is high (over {$ELASTICSEARCH.HEAP_USED.MAX.WARN}% for 1h) | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes. |
{TEMPLATE_NAME:es.node.jvm.mem.heap_used_percent[{#ES.NODE}].min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |
WARNING | Depends on: - ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h) |
ES {#ES.NODE}: Percent of JVM heap in use is critical (over {$ELASTICSEARCH.HEAP_USED.MAX.CRIT}% for 1h) | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes. |
{TEMPLATE_NAME:es.node.jvm.mem.heap_used_percent[{#ES.NODE}].min(1h)}>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |
HIGH | |
ES {#ES.NODE}: Query latency is too high (over {$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN}ms for 5m) | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
{TEMPLATE_NAME:es.node.indices.search.query_latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Fetch latency is too high (over {$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN}ms for 5m) | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results. |
{TEMPLATE_NAME:es.node.indices.search.fetch_latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Write thread pool executor has the rejected tasks (for 5m) | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
{TEMPLATE_NAME:es.node.thread_pool.write.rejected.rate[{#ES.NODE}].min(5m)}>0 |
WARNING | |
ES {#ES.NODE}: Search thread pool executor has the rejected tasks (for 5m) | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
{TEMPLATE_NAME:es.node.thread_pool.search.rejected.rate[{#ES.NODE}].min(5m)}>0 |
WARNING | |
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks (for 5m) | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
{TEMPLATE_NAME:es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}].min(5m)}>0 |
WARNING | |
ES {#ES.NODE}: Indexing latency is too high (over {$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN}ms for 5m) | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there). |
{TEMPLATE_NAME:es.node.indices.indexing.index_latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Flush latency is too high (over {$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN}ms for 5m) | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index. |
{TEMPLATE_NAME:es.node.indices.flush.latency[{#ES.NODE}].min(5m)}>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |
WARNING |
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide a feedback, discuss the template or ask for help with it at ZABBIX forums.
References
https://www.elastic.co/guide/en/elasticsearch/reference/index.html