The high availability mode offers protection against software/hardware failures for Zabbix server and allows to minimize downtime during software/hardware maintenance.
The high availability (HA) cluster is an opt-in solution and it is supported for Zabbix server. The native HA solution is designed to be simple in use, it will work across sites and does not have specific requirements for the databases that Zabbix recognizes. Users are free to use the native Zabbix HA solution, or a third party HA solution, depending on what best suits the high availability requirements in their environment.
The solution consists of multiple zabbix_server instances or nodes. Every node:
Only one node can be active (working) at a time. The standby nodes do no data collection, processing or other regular server activities; they do not listen on ports; they have minimum database connections.
Both active and standby nodes update their last access time every 5 seconds. Each standby node monitors the last access time of the active node. If the last access time of the active node is over 'failover delay' seconds, the standby node switches itself to be the active node and assigns 'unavailable' status to the previously active node.
The active node monitors its own database connectivity - if it is lost for more than failover delay-5
seconds, it must stop all processing and switch to standby mode. The active node also monitors the status of the standby nodes - if the last access time of a standby node is over 'failover delay' seconds, the standby node is assigned the 'unavailable' status.
The failover delay is configurable, with the minimum being 10 seconds.
The nodes are designed to be compatible across minor Zabbix versions.
To turn any Zabbix server from a standalone server into an HA cluster node, specify the HANodeName parameter in the server configuration.
The NodeAddress parameter (address:port), if set, must be used by the frontend for the active node, overriding the value in zabbix.conf.php.
Make sure that Zabbix server address:port is not defined in the frontend configuration.
Zabbix frontend will autodetect the active node by reading settings from the nodes table in Zabbix database. Node address of the active node will be used as the Zabbix server address.
To enable connections to multiple servers in a high availability setup, list addresses of the HA nodes in the Server parameter of the proxy, separated by a semicolon.
To enable connections to multiple servers in a high availability setup, list addresses of the HA nodes in the ServerActive parameter of the agent, separated by a semicolon.
Zabbix will fail over to another node automatically if the active node stops. There must be at least one node in standby status for the failover to happen.
How fast will the failover be? All nodes update their last access time (and status, if it is changed) every 5 seconds. So:
If the active node shuts down and manages to report its status as "shut down", another node will take over within 5 seconds.
If the active node shuts down/becomes unavailable without being able to update its status, standby nodes will wait for the failover delay + 5 seconds to take over
The failover delay is configurable, with the supported range between 10 seconds and 15 minutes (one minute by default). To change the failover delay, you may run:
zabbix_server -R ha_set_failover_delay=5m
The current status of the HA cluster can be managed using the dedicated runtime control options:
Node status can be monitored:
ha_status
runtime control option of the server (see above).The zabbix[cluster,discovery,nodes]
internal item can be used for node discovery, as it returns a JSON with high availability node information.
To disable a high availability cluster:
The high availability (HA) cluster is an opt-in solution and it is supported for Zabbix server. The native HA solution is designed to be simple in use, it will work across sites and does not have specific requirements for the databases that Zabbix recognizes. Users are free to use the native Zabbix HA solution, or a third party HA solution, depending on what best suits the high availability requirements in their environment.
The solution consists of multiple zabbix_server instances or nodes. Every node:
Only one node can be active (working) at a time. The standby nodes do no data collection, processing or other regular server activities; they do not listen on ports; they have minimum database connections.
Both active and standby nodes update their last access time every 5 seconds. Each standby node monitors the last access time of the active node. If the last access time of the active node is over 'failover delay' seconds, the standby node switches itself to be the active node and assigns 'unavailable' status to the previously active node.
The active node monitors its own database connectivity - if it is lost for more than failover delay-5
seconds, it must stop all processing and switch to standby mode. The active node also monitors the status of the standby nodes - if the last access time of a standby node is over 'failover delay' seconds, the standby node is assigned the 'unavailable' status.
The failover delay is configurable, with the minimum being 10 seconds.
The nodes are designed to be compatible across minor Zabbix versions.