HashiCorp Vault by HTTP
The template to monitor HashiCorp Vault by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Vault by HTTP
— collects metrics by HTTP agent from /sys/metrics
API endpoint.
Zabbix version: 7.2 and higher.
Tested versions
This template has been tested on:
- Vault 1.6
Zabbix should be configured according to the instructions in the Templates out of the box section.
See Zabbix template operation for basic instructions.
Configure Vault API. See Vault Configuration.
Create a Vault service token and set it to the macro {$VAULT.TOKEN}
Macros used
Name | Description | Default |
{$VAULT.API.PORT} | Vault port. |
8200 |
{$VAULT.API.SCHEME} | Vault API scheme. |
http |
{$VAULT.HOST} | Vault host name. |
{$VAULT.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors for trigger expression. |
90 |
{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} | Maximum number of Vault leadership setup failed. |
5 |
{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} | Maximum number of Vault leadership losses. |
5 |
{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} | Maximum number of Vault leadership step downs. |
5 |
{$VAULT.LLD.FILTER.STORAGE.MATCHES} | Filter of discoverable storage backends. |
.+ |
{$VAULT.TOKEN} | Vault auth token. |
{$VAULT.TOKEN.ACCESSORS} | Vault accessors separated by spaces for monitoring token expiration time. |
{$VAULT.TOKEN.TTL.MIN.CRIT} | Token TTL critical threshold. |
3d |
{$VAULT.TOKEN.TTL.MIN.WARN} | Token TTL warning threshold. |
7d |
Name | Description | Type | Key and additional info |
Get health | HTTP agent | vault.get_health Preprocessing
Get leader | HTTP agent | vault.get_leader Preprocessing
Get metrics | HTTP agent | vault.get_metrics Preprocessing
Clear metrics | Dependent item | vault.clear_metrics Preprocessing
Get tokens | Get information about tokens via their accessors. Accessors are defined in the macro "{$VAULT.TOKEN.ACCESSORS}". |
Script | vault.get_tokens |
Check WAL discovery | Dependent item | vault.check_wal_discovery Preprocessing
Check replication discovery | Dependent item | vault.check_replication_discovery Preprocessing
Check storage discovery | Dependent item | vault.check_storage_discovery Preprocessing
Check mountpoint discovery | Dependent item | vault.check_mountpoint_discovery Preprocessing
Initialized | Initialization status. |
Dependent item | Preprocessing
Sealed | Seal status. |
Dependent item | Preprocessing
Standby | Standby status. |
Dependent item | Preprocessing
Performance standby | Performance standby status. |
Dependent item | Preprocessing
Performance replication | Performance replication mode |
Dependent item | Preprocessing
Disaster Recovery replication | Disaster recovery replication mode |
Dependent item | Preprocessing
Version | Server version. |
Dependent item | Preprocessing
Healthcheck | Vault healthcheck. |
Dependent item | Preprocessing
HA enabled | HA enabled status. |
Dependent item | vault.leader.ha_enabled Preprocessing
Is leader | Leader status. |
Dependent item | vault.leader.is_self Preprocessing
Get metrics error | Get metrics error. |
Dependent item | vault.get_metrics.error Preprocessing
Process CPU seconds, total | Total user and system CPU time spent in seconds. |
Dependent item | Preprocessing
Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | vault.metrics.process.max.fds Preprocessing
Open file descriptors, current | Number of open file descriptors. |
Dependent item | Preprocessing
Process resident memory | Resident memory size in bytes. |
Dependent item | vault.metrics.process.resident_memory.bytes Preprocessing
Uptime | Server uptime. |
Dependent item | vault.metrics.process.uptime Preprocessing
Process virtual memory, current | Virtual memory size in bytes. |
Dependent item | vault.metrics.process.virtual_memory.bytes Preprocessing
Process virtual memory, max | Maximum amount of virtual memory available in bytes. |
Dependent item | vault.metrics.process.virtual_memory.max.bytes Preprocessing
Audit log requests, rate | Number of all audit log requests across all audit log devices. |
Dependent item | vault.metrics.audit.log.request.rate Preprocessing
Audit log request failures, rate | Number of audit log request failures. |
Dependent item | vault.metrics.audit.log.request.failure.rate Preprocessing
Audit log response, rate | Number of audit log responses across all audit log devices. |
Dependent item | vault.metrics.audit.log.response.rate Preprocessing
Audit log response failures, rate | Number of audit log response failures. |
Dependent item | vault.metrics.audit.log.response.failure.rate Preprocessing
Barrier DELETE ops, rate | Number of DELETE operations at the barrier. |
Dependent item | vault.metrics.barrier.delete.rate Preprocessing
Barrier GET ops, rate | Number of GET operations at the barrier. |
Dependent item | vault.metrics.vault.barrier.get.rate Preprocessing
Barrier LIST ops, rate | Number of LIST operations at the barrier. |
Dependent item | vault.metrics.barrier.list.rate Preprocessing
Barrier PUT ops, rate | Number of PUT operations at the barrier. |
Dependent item | vault.metrics.barrier.put.rate Preprocessing
Cache hit, rate | Number of times a value was retrieved from the LRU cache. |
Dependent item | vault.metrics.cache.hit.rate Preprocessing
Cache miss, rate | Number of times a value was not in the LRU cache. The results in a read from the configured storage. |
Dependent item | vault.metrics.cache.miss.rate Preprocessing
Cache write, rate | Number of times a value was written to the LRU cache. |
Dependent item | vault.metrics.cache.write.rate Preprocessing
Check token, rate | Number of token checks handled by Vault core. |
Dependent item | vault.metrics.core.check.token.rate Preprocessing
Fetch ACL and token, rate | Number of ACL and corresponding token entry fetches handled by Vault core. |
Dependent item | vault.metrics.core.fetch.acl_and_token Preprocessing
Requests, rate | Number of requests handled by Vault core. |
Dependent item | vault.metrics.core.handle.request Preprocessing
Leadership setup failed, counter | Cluster leadership setup failures which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership.setup_failed Preprocessing
Leadership setup lost, counter | Cluster leadership losses which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership_lost Preprocessing
Post-unseal ops, counter | Duration of time taken by post-unseal operations handled by Vault core. |
Dependent item | vault.metrics.core.post_unseal Preprocessing
Pre-seal ops, counter | Duration of time taken by pre-seal operations. |
Dependent item | vault.metrics.core.pre_seal Preprocessing
Requested seal ops, counter | Duration of time taken by requested seal operations. |
Dependent item | vault.metrics.core.seal_with_request Preprocessing
Seal ops, counter | Duration of time taken by seal operations. |
Dependent item | vault.metrics.core.seal Preprocessing
Internal seal ops, counter | Duration of time taken by internal seal operations. |
Dependent item | vault.metrics.core.seal_internal Preprocessing
Leadership step downs, counter | Cluster leadership step down. |
Dependent item | vault.metrics.core.step_down Preprocessing
Unseal ops, counter | Duration of time taken by unseal operations. |
Dependent item | vault.metrics.core.unseal Preprocessing
Fetch lease times, counter | Time taken to fetch lease times. |
Dependent item | Preprocessing
Fetch lease times by token, counter | Time taken to fetch lease times by token. |
Dependent item | Preprocessing
Number of expiring leases | Number of all leases which are eligible for eventual expiry. |
Dependent item | vault.metrics.expire.num_leases Preprocessing
Expire revoke, count | Time taken to revoke a token. |
Dependent item | vault.metrics.expire.revoke Preprocessing
Expire revoke force, count | Time taken to forcibly revoke a token. |
Dependent item | vault.metrics.expire.revoke.force Preprocessing
Expire revoke prefix, count | Tokens revoke on a prefix. |
Dependent item | vault.metrics.expire.revoke.prefix Preprocessing
Revoke secrets by token, count | Time taken to revoke all secrets issued with a given token. |
Dependent item | vault.metrics.expire.revoke.by_token Preprocessing
Expire renew, count | Time taken to renew a lease. |
Dependent item | vault.metrics.expire.renew Preprocessing
Renew token, count | Time taken to renew a token which does not need to invoke a logical backend. |
Dependent item | vault.metrics.expire.renew_token Preprocessing
Register ops, count | Time taken for register operations. |
Dependent item | vault.metrics.expire.register Preprocessing
Register auth ops, count | Time taken for register authentication operations which create lease entries without lease ID. |
Dependent item | vault.metrics.expire.register.auth Preprocessing
Policy GET ops, rate | Number of operations to get a policy. |
Dependent item | vault.metrics.policy.get_policy.rate Preprocessing
Policy LIST ops, rate | Number of operations to list policies. |
Dependent item | vault.metrics.policy.list_policies.rate Preprocessing
Policy DELETE ops, rate | Number of operations to delete a policy. |
Dependent item | vault.metrics.policy.delete_policy.rate Preprocessing
Policy SET ops, rate | Number of operations to set a policy. |
Dependent item | vault.metrics.policy.set_policy.rate Preprocessing
Token create, count | The time taken to create a token. |
Dependent item | vault.metrics.token.create Preprocessing
Token createAccessor, count | The time taken to create a token accessor. |
Dependent item | vault.metrics.token.createAccessor Preprocessing
Token lookup, rate | Number of token look up. |
Dependent item | vault.metrics.token.lookup.rate Preprocessing
Token revoke, count | The time taken to look up a token. |
Dependent item | vault.metrics.token.revoke Preprocessing
Token revoke tree, count | Time taken to revoke a token tree. |
Dependent item | vault.metrics.token.revoke.tree Preprocessing
Token store, count | Time taken to store an updated token entry without writing to the secondary index. |
Dependent item | Preprocessing
Runtime allocated bytes | Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value. |
Dependent item | vault.metrics.runtime.alloc.bytes Preprocessing
Runtime freed objects | Number of freed objects. |
Dependent item | Preprocessing
Runtime heap objects | Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.heap.objects Preprocessing
Runtime malloc count | Cumulative count of allocated heap objects. |
Dependent item | vault.metrics.runtime.malloc.count Preprocessing
Runtime num goroutines | Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.num_goroutines Preprocessing
Runtime sys bytes | Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system. |
Dependent item | vault.metrics.runtime.sys.bytes Preprocessing
Runtime GC pause, total | The total garbage collector pause time since Vault was last started. |
Dependent item | Preprocessing
Runtime GC runs, total | Total number of garbage collection runs since Vault was last started. |
Dependent item | Preprocessing
Token count, total | Total number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes. |
Dependent item | vault.metrics.token Preprocessing
Token count by auth, total | Total number of service tokens that were created by an auth method. |
Dependent item | vault.metrics.token.by_auth Preprocessing
Token count by policy, total | Total number of service tokens that have a policy attached. |
Dependent item | vault.metrics.token.by_policy Preprocessing
Token count by ttl, total | Number of service tokens, grouped by the TTL range they were assigned at creation. |
Dependent item | vault.metrics.token.by_ttl Preprocessing
Token creation, rate | Number of service or batch tokens created. |
Dependent item | vault.metrics.token.creation.rate Preprocessing
Secret kv entries | Number of entries in each key-value secret engine. |
Dependent item | vault.metrics.secret.kv.count Preprocessing
Token secret lease creation, rate | Counts the number of leases created by secret engines. |
Dependent item | Preprocessing
Name | Description | Expression | Severity | Dependencies and additional info |
HashiCorp Vault: Vault server is sealed | |
last(/HashiCorp Vault by HTTP/ |
Average | |
HashiCorp Vault: Version has changed | Vault version has changed. Acknowledge to close the problem manually. |
last(/HashiCorp Vault by HTTP/,#1)<>last(/HashiCorp Vault by HTTP/,#2) and length(last(/HashiCorp Vault by HTTP/>0 |
Info | Manual close: Yes |
HashiCorp Vault: Vault server is not responding | last(/HashiCorp Vault by HTTP/ |
High | ||
HashiCorp Vault: Failed to get metrics | length(last(/HashiCorp Vault by HTTP/vault.get_metrics.error))>0 |
Warning | Depends on:
HashiCorp Vault: Current number of open files is too high | min(/HashiCorp Vault by HTTP/,5m)/last(/HashiCorp Vault by HTTP/vault.metrics.process.max.fds)*100>{$VAULT.OPEN.FDS.MAX.WARN} |
Warning | ||
HashiCorp Vault: has been restarted | Uptime is less than 10 minutes. |
last(/HashiCorp Vault by HTTP/vault.metrics.process.uptime)<10m |
Info | Manual close: Yes |
HashiCorp Vault: High frequency of leadership setup failures | There have been more than {$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Vault leadership setup failures in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h))>{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} |
Average | |
HashiCorp Vault: High frequency of leadership losses | There have been more than {$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Vault leadership losses in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h))>{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} |
Average | |
HashiCorp Vault: High frequency of leadership step downs | There have been more than {$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Vault leadership step downs in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h))>{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} |
Average |
LLD rule Storage metrics discovery
Name | Description | Type | Key and additional info |
Storage metrics discovery | Storage backend metrics discovery. |
Dependent item | |
Item prototypes for Storage metrics discovery
Name | Description | Type | Key and additional info |
Storage [{#STORAGE}] {#OPERATION} ops, rate | Number of a {#OPERATION} operation against the {#STORAGE} storage backend. |
Dependent item |[{#STORAGE}, {#OPERATION}] Preprocessing
LLD rule Mountpoint metrics discovery
Name | Description | Type | Key and additional info |
Mountpoint metrics discovery | Mountpoint metrics discovery. |
Dependent item | vault.mountpoint.discovery |
Item prototypes for Mountpoint metrics discovery
Name | Description | Type | Key and additional info |
Rollback attempt [{#MOUNTPOINT}] ops, rate | Number of operations to perform a rollback operation on the given mount point. |
Dependent item | vault.metrics.rollback.attempt.rate[{#MOUNTPOINT}] Preprocessing
Route rollback [{#MOUNTPOINT}] ops, rate | Number of operations to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors. |
Dependent item | vault.metrics.route.rollback.rate[{#MOUNTPOINT}] Preprocessing
LLD rule WAL metrics discovery
Name | Description | Type | Key and additional info |
WAL metrics discovery | Discovery for WAL metrics. |
Dependent item | vault.wal.discovery |
Item prototypes for WAL metrics discovery
Name | Description | Type | Key and additional info |
Delete WALs, count{#SINGLETON} | Time taken to delete a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.deletewals[{#SINGLETON}] Preprocessing
GC deleted WAL{#SINGLETON} | Number of Write Ahead Logs (WAL) deleted during each garbage collection run. |
Dependent item | vault.metrics.wal.gc.deleted[{#SINGLETON}] Preprocessing
WALs on disk, total{#SINGLETON} | Total Number of Write Ahead Logs (WAL) on disk. |
Dependent item |[{#SINGLETON}] Preprocessing
Load WALs, count{#SINGLETON} | Time taken to load a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.loadWAL[{#SINGLETON}] Preprocessing
Persist WALs, count{#SINGLETON} | Time taken to persist a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.persistwals[{#SINGLETON}] Preprocessing
Flush ready WAL, count{#SINGLETON} | Time taken to flush a ready Write Ahead Log (WAL) to storage. |
Dependent item | vault.metrics.wal.flushready[{#SINGLETON}] Preprocessing
LLD rule Replication metrics discovery
Name | Description | Type | Key and additional info |
Replication metrics discovery | Discovery for replication metrics. |
Dependent item | vault.replication.discovery |
Item prototypes for Replication metrics discovery
Name | Description | Type | Key and additional info |
Stream WAL missing guard, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.missing_guard[{#SINGLETON}] Preprocessing
Stream WAL guard found, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.guard_found[{#SINGLETON}] Preprocessing
Merkle commit index{#SINGLETON} | The last committed index in the Merkle Tree. |
Dependent item | vault.metrics.replication.merkle.commit_index[{#SINGLETON}] Preprocessing
Last WAL{#SINGLETON} | The index of the last WAL. |
Dependent item | vault.metrics.replication.wal.last_wal[{#SINGLETON}] Preprocessing
Last DR WAL{#SINGLETON} | The index of the last DR WAL. |
Dependent item | vault.metrics.replication.wal.last_dr_wal[{#SINGLETON}] Preprocessing
Last performance WAL{#SINGLETON} | The index of the last Performance WAL. |
Dependent item | vault.metrics.replication.wal.last_performance_wal[{#SINGLETON}] Preprocessing
Last remote WAL{#SINGLETON} | The index of the last remote WAL. |
Dependent item | vault.metrics.replication.fsm.last_remote_wal[{#SINGLETON}] Preprocessing
LLD rule Token metrics discovery
Name | Description | Type | Key and additional info |
Token metrics discovery | Tokens metrics discovery. |
Dependent item | vault.tokens.discovery |
Item prototypes for Token metrics discovery
Name | Description | Type | Key and additional info |
Token [{#TOKEN_NAME}] error | Token lookup error text. |
Dependent item | vault.token_via_accessor.error["{#ACCESSOR}"] Preprocessing
Token [{#TOKEN_NAME}] has TTL | The Token has TTL. |
Dependent item | vault.token_via_accessor.has_ttl["{#ACCESSOR}"] Preprocessing
Token [{#TOKEN_NAME}] TTL | The TTL period of the token. |
Dependent item | vault.token_via_accessor.ttl["{#ACCESSOR}"] Preprocessing
Trigger prototypes for Token metrics discovery
Name | Description | Expression | Severity | Dependencies and additional info |
HashiCorp Vault: Token [{#TOKEN_NAME}] lookup error occurred | length(last(/HashiCorp Vault by HTTP/vault.token_via_accessor.error["{#ACCESSOR}"]))>0 |
Warning | Depends on:
HashiCorp Vault: Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.CRIT} |
Average | ||
HashiCorp Vault: Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.WARN} |
Warning | Depends on:
