Envoy Proxy by HTTP
Overview
The template to monitor Envoy Proxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Envoy Proxy by HTTP
- collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).
Requirements
Zabbix version: 7.2 and higher.
Tested versions
This template has been tested on:
- Envoy Proxy 1.20.2
Configuration
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup
Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview
Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.
Macros used
Name | Description | Default |
---|---|---|
{$ENVOY.URL} | Instance URL. |
http://localhost:9901 |
{$ENVOY.METRICS.PATH} | The path Zabbix will scrape metrics in prometheus format from. |
/stats/prometheus |
{$ENVOY.CERT.MIN} | Minimum number of days before certificate expiration used for trigger expression. |
7 |
Items
Name | Description | Type | Key and additional info |
---|---|---|---|
Get node metrics | Get server metrics. |
HTTP agent | envoy.get_metrics Preprocessing
|
Server state | State of the server. Live - (default) Server is live and serving traffic. Draining - Server is draining listeners in response to external health checks failing. Pre initializing - Server has not yet completed cluster manager initialization. Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS). |
Dependent item | envoy.server.state Preprocessing
|
Server live | 1 if the server is not currently draining, 0 otherwise. |
Dependent item | envoy.server.live Preprocessing
|
Uptime | Current server uptime in seconds. |
Dependent item | envoy.server.uptime Preprocessing
|
Certificate expiration, day before | Number of days until the next certificate being managed will expire. |
Dependent item | envoy.server.days_until_first_cert_expiring Preprocessing
|
Server concurrency | Number of worker threads. |
Dependent item | envoy.server.concurrency Preprocessing
|
Memory allocated | Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart. |
Dependent item | envoy.server.memory_allocated Preprocessing
|
Memory heap size | Current reserved heap size in bytes. New Envoy process heap size on hot restart. |
Dependent item | envoy.server.memory_heap_size Preprocessing
|
Memory physical size | Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart. |
Dependent item | envoy.server.memory_physical_size Preprocessing
|
Filesystem, flushed by timer rate | Total number of times internal flush buffers are written to a file due to flush timeout per second. |
Dependent item | envoy.filesystem.flushed_by_timer.rate Preprocessing
|
Filesystem, write completed rate | Total number of times a file was written per second. |
Dependent item | envoy.filesystem.write_completed.rate Preprocessing
|
Filesystem, write failed rate | Total number of times an error occurred during a file write operation per second. |
Dependent item | envoy.filesystem.write_failed.rate Preprocessing
|
Filesystem, reopen failed rate | Total number of times a file was failed to be opened per second. |
Dependent item | envoy.filesystem.reopen_failed.rate Preprocessing
|
Connections, total | Total connections of both new and old Envoy processes. |
Dependent item | envoy.server.total_connections Preprocessing
|
Connections, parent | Total connections of the old Envoy process on hot restart. |
Dependent item | envoy.server.parent_connections Preprocessing
|
Clusters, warming | Number of currently warming (not active) clusters. |
Dependent item | envoy.cluster_manager.warming_clusters Preprocessing
|
Clusters, active | Number of currently active (warmed) clusters. |
Dependent item | envoy.cluster_manager.active_clusters Preprocessing
|
Clusters, added rate | Total clusters added (either via static config or CDS) per second. |
Dependent item | envoy.cluster_manager.cluster_added.rate Preprocessing
|
Clusters, modified rate | Total clusters modified (via CDS) per second. |
Dependent item | envoy.cluster_manager.cluster_modified.rate Preprocessing
|
Clusters, removed rate | Total clusters removed (via CDS) per second. |
Dependent item | envoy.cluster_manager.cluster_removed.rate Preprocessing
|
Clusters, updates rate | Total cluster updates per second. |
Dependent item | envoy.cluster_manager.cluster_updated.rate Preprocessing
|
Listeners, active | Number of currently active listeners. |
Dependent item | envoy.listener_manager.total_listeners_active Preprocessing
|
Listeners, draining | Number of currently draining listeners. |
Dependent item | envoy.listener_manager.total_listeners_draining Preprocessing
|
Listener, warming | Number of currently warming listeners. |
Dependent item | envoy.listener_manager.total_listeners_warming Preprocessing
|
Listener manager, initialized | A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers. |
Dependent item | envoy.listener_manager.workers_started Preprocessing
|
Listeners, create failure | Total failed listener object additions to workers per second. |
Dependent item | envoy.listener_manager.listener_create_failure.rate Preprocessing
|
Listeners, create success | Total listener objects successfully added to workers per second. |
Dependent item | envoy.listener_manager.listener_create_success.rate Preprocessing
|
Listeners, added | Total listeners added (either via static config or LDS) per second. |
Dependent item | envoy.listener_manager.listener_added.rate Preprocessing
|
Listeners, stopped | Total listeners stopped per second. |
Dependent item | envoy.listener_manager.listener_stopped.rate Preprocessing
|
Triggers
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: Server state is not live | last(/Envoy Proxy by HTTP/envoy.server.state) > 0 |
Average | ||
Envoy Proxy: Service has been restarted | Uptime is less than 10 minutes. |
last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m |
Info | Manual close: Yes |
Envoy Proxy: Failed to fetch metrics data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1 |
Warning | Manual close: Yes |
Envoy Proxy: SSL certificate expires soon | Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire. |
last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN} |
Warning |
LLD rule Cluster metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics discovery | Dependent item | envoy.lld.cluster Preprocessing
|
Item prototypes for Cluster metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster ["{#CLUSTER_NAME}"]: Membership, total | Current cluster membership total. |
Dependent item | envoy.cluster.membership_total["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Membership, healthy | Current cluster healthy total (inclusive of both health checking and outlier detection). |
Dependent item | envoy.cluster.membership_healthy["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy | Current cluster unhealthy. |
Calculated | envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"] |
Cluster ["{#CLUSTER_NAME}"]: Membership, degraded | Current cluster degraded total. |
Dependent item | envoy.cluster.membership_degraded["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Connections, total | Current cluster total connections. |
Dependent item | envoy.cluster.upstream_cx_total["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Connections, active | Current cluster total active connections. |
Dependent item | envoy.cluster.upstream_cx_active["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests total, rate | Current cluster request total per second. |
Dependent item | envoy.cluster.upstream_rq_total.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate | Current cluster requests that timed out waiting for a response per second. |
Dependent item | envoy.cluster.upstream_rq_timeout.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate | Total upstream requests completed per second. |
Dependent item | envoy.cluster.upstream_rq_completed.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstream_rq_2x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstream_rq_3x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstream_rq_4x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstream_rq_5x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests pending | Total active requests pending a connection pool connection. |
Dependent item | envoy.cluster.upstream_rq_pending_active["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests active | Total active requests. |
Dependent item | envoy.cluster.upstream_rq_active["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate | Total sent connection bytes per second. |
Dependent item | envoy.cluster.upstream_cx_tx_bytes_total.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate | Total received connection bytes per second. |
Dependent item | envoy.cluster.upstream_cx_rx_bytes_total.rate["{#CLUSTER_NAME}"] Preprocessing
|
Trigger prototypes for Cluster metrics discovery
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: There are unhealthy clusters | last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0 |
Average |
LLD rule Listeners metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Listeners metrics discovery | Dependent item | envoy.lld.listeners Preprocessing
|
Item prototypes for Listeners metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
Listener ["{#LISTENER_ADDRESS}"]: Connections, active | Total active connections. |
Dependent item | envoy.listener.downstream_cx_active["{#LISTENER_ADDRESS}"] Preprocessing
|
Listener ["{#LISTENER_ADDRESS}"]: Connections, rate | Total connections per second. |
Dependent item | envoy.listener.downstream_cx_total.rate["{#LISTENER_ADDRESS}"] Preprocessing
|
Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing | Sockets currently undergoing listener filter processing. |
Dependent item | envoy.listener.downstream_pre_cx_active["{#LISTENER_ADDRESS}"] Preprocessing
|
LLD rule HTTP metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP metrics discovery | Dependent item | envoy.lld.http Preprocessing
|
Item prototypes for HTTP metrics discovery
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP ["{#CONN_MANAGER}"]: Requests, rate | Total active connections per second. |
Dependent item | envoy.http.downstream_rq_total.rate["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Requests, active | Total active requests. |
Dependent item | envoy.http.downstream_rq_active["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate | Total requests closed due to a timeout on the request path per second. |
Dependent item | envoy.http.downstream_rq_timeout["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Connections, rate | Total connections per second. |
Dependent item | envoy.http.downstream_cx_total["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Connections, active | Total active connections. |
Dependent item | envoy.http.downstream_cx_active["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Bytes in, rate | Total bytes received per second. |
Dependent item | envoy.http.downstream_cx_rx_bytes_total.rate["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Bytes out, rate | Total bytes sent per second. |
Dependent item | envoy.http.downstream_cx_tx_bytes_tota.rate["{#CONN_MANAGER}"] Preprocessing
|
Feedback
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums