Ceph

Ceph is a free-software storage platform, implements object storage on a single distributed computer cluster, and provides interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.

Available solutions




This template is for Zabbix version: 7.2

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/app/ceph_agent2?at=release/7.2

Ceph by Zabbix agent 2

Overview

The template to monitor Ceph cluster by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template Ceph by Zabbix agent 2 — collects metrics by polling zabbix-agent2.

Requirements

Zabbix version: 7.2 and higher.

Tested versions

This template has been tested on:

  • Ceph 14.2

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

  1. Setup and configure zabbix-agent2 compiled with the Ceph monitoring plugin.
  2. Set the {$CEPH.CONNSTRING} such as <protocol(host:port)> or named session.
  3. Set the user name and password in host macros ({$CEPH.USER}, {$CEPH.API.KEY}) if you want to override parameters from the Zabbix agent configuration file.

Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Macros used

Name Description Default
{$CEPH.USER} zabbix
{$CEPH.API.KEY} zabbix_pass
{$CEPH.CONNSTRING} https://localhost:8003

Items

Name Description Type Key and additional info
Get overall cluster status Zabbix agent ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Get OSD stats Zabbix agent ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Get OSD dump Zabbix agent ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Get df Zabbix agent ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Ping Zabbix agent ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Preprocessing

  • Discard unchanged with heartbeat: 30m

Number of Monitors

The number of Monitors configured in a Ceph cluster.

Dependent item ceph.num_mon

Preprocessing

  • JSON Path: $.num_mon

  • Discard unchanged with heartbeat: 30m

Overall cluster status

The overall Ceph cluster status, eg 0 - HEALTH_OK, 1 - HEALTH_WARN or 2 - HEALTH_ERR.

Dependent item ceph.overall_status

Preprocessing

  • JSON Path: $.overall_status

  • Discard unchanged with heartbeat: 10m

Minimum Mon release version

min_mon_release_name

Dependent item ceph.min_mon_release_name

Preprocessing

  • JSON Path: $.min_mon_release_name

  • Discard unchanged with heartbeat: 1h

Ceph Read bandwidth

The global read bytes per second.

Dependent item ceph.rd_bytes.rate

Preprocessing

  • JSON Path: $.rd_bytes

  • Change per second
Ceph Write bandwidth

The global write bytes per second.

Dependent item ceph.wr_bytes.rate

Preprocessing

  • JSON Path: $.wr_bytes

  • Change per second
Ceph Read operations per sec

The global read operations per second.

Dependent item ceph.rd_ops.rate

Preprocessing

  • JSON Path: $.rd_ops

  • Change per second
Ceph Write operations per sec

The global write operations per second.

Dependent item ceph.wr_ops.rate

Preprocessing

  • JSON Path: $.wr_ops

  • Change per second
Total bytes available

The total bytes available in a Ceph cluster.

Dependent item ceph.total_avail_bytes

Preprocessing

  • JSON Path: $.total_avail_bytes

Total bytes

The total (RAW) capacity of a Ceph cluster in bytes.

Dependent item ceph.total_bytes

Preprocessing

  • JSON Path: $.total_bytes

Total bytes used

The total bytes used in a Ceph cluster.

Dependent item ceph.total_used_bytes

Preprocessing

  • JSON Path: $.total_used_bytes

Total number of objects

The total number of objects in a Ceph cluster.

Dependent item ceph.total_objects

Preprocessing

  • JSON Path: $.total_objects

Number of Placement Groups

The total number of Placement Groups in a Ceph cluster.

Dependent item ceph.num_pg

Preprocessing

  • JSON Path: $.num_pg

  • Discard unchanged with heartbeat: 10m

Number of Placement Groups in Temporary state

The total number of Placement Groups in a pg_temp state

Dependent item ceph.num_pg_temp

Preprocessing

  • JSON Path: $.num_pg_temp

Number of Placement Groups in Active state

The total number of Placement Groups in an active state.

Dependent item ceph.pg_states.active

Preprocessing

  • JSON Path: $.pg_states.active

Number of Placement Groups in Clean state

The total number of Placement Groups in a clean state.

Dependent item ceph.pg_states.clean

Preprocessing

  • JSON Path: $.pg_states.clean

Number of Placement Groups in Peering state

The total number of Placement Groups in a peering state.

Dependent item ceph.pg_states.peering

Preprocessing

  • JSON Path: $.pg_states.peering

Number of Placement Groups in Scrubbing state

The total number of Placement Groups in a scrubbing state.

Dependent item ceph.pg_states.scrubbing

Preprocessing

  • JSON Path: $.pg_states.scrubbing

Number of Placement Groups in Undersized state

The total number of Placement Groups in an undersized state.

Dependent item ceph.pg_states.undersized

Preprocessing

  • JSON Path: $.pg_states.undersized

Number of Placement Groups in Backfilling state

The total number of Placement Groups in a backfill state.

Dependent item ceph.pg_states.backfilling

Preprocessing

  • JSON Path: $.pg_states.backfilling

Number of Placement Groups in degraded state

The total number of Placement Groups in a degraded state.

Dependent item ceph.pg_states.degraded

Preprocessing

  • JSON Path: $.pg_states.degraded

Number of Placement Groups in inconsistent state

The total number of Placement Groups in an inconsistent state.

Dependent item ceph.pg_states.inconsistent

Preprocessing

  • JSON Path: $.pg_states.inconsistent

Number of Placement Groups in Unknown state

The total number of Placement Groups in an unknown state.

Dependent item ceph.pg_states.unknown

Preprocessing

  • JSON Path: $.pg_states.unknown

Number of Placement Groups in remapped state

The total number of Placement Groups in a remapped state.

Dependent item ceph.pg_states.remapped

Preprocessing

  • JSON Path: $.pg_states.remapped

Number of Placement Groups in recovering state

The total number of Placement Groups in a recovering state.

Dependent item ceph.pg_states.recovering

Preprocessing

  • JSON Path: $.pg_states.recovering

Number of Placement Groups in backfill_toofull state

The total number of Placement Groups in a backfill_toofull state.

Dependent item ceph.pg_states.backfill_toofull

Preprocessing

  • JSON Path: $.pg_states.backfill_toofull

Number of Placement Groups in backfill_wait state

The total number of Placement Groups in a backfill_wait state.

Dependent item ceph.pg_states.backfill_wait

Preprocessing

  • JSON Path: $.pg_states.backfill_wait

Number of Placement Groups in recovery_wait state

The total number of Placement Groups in a recovery_wait state.

Dependent item ceph.pg_states.recovery_wait

Preprocessing

  • JSON Path: $.pg_states.recovery_wait

Number of Pools

The total number of pools in a Ceph cluster.

Dependent item ceph.num_pools

Preprocessing

  • JSON Path: $.num_pools

Number of OSDs

The number of the known storage daemons in a Ceph cluster.

Dependent item ceph.num_osd

Preprocessing

  • JSON Path: $.num_osd

  • Discard unchanged with heartbeat: 10m

Number of OSDs in state: UP

The total number of the online storage daemons in a Ceph cluster.

Dependent item ceph.num_osd_up

Preprocessing

  • JSON Path: $.num_osd_up

  • Discard unchanged with heartbeat: 10m

Number of OSDs in state: IN

The total number of the participating storage daemons in a Ceph cluster.

Dependent item ceph.num_osd_in

Preprocessing

  • JSON Path: $.num_osd_in

  • Discard unchanged with heartbeat: 10m

Ceph OSD avg fill

The average fill of OSDs.

Dependent item ceph.osd_fill.avg

Preprocessing

  • JSON Path: $.osd_fill.avg

Ceph OSD max fill

The percentage of the most filled OSD.

Dependent item ceph.osd_fill.max

Preprocessing

  • JSON Path: $.osd_fill.max

Ceph OSD min fill

The percentage fill of the minimum filled OSD.

Dependent item ceph.osd_fill.min

Preprocessing

  • JSON Path: $.osd_fill.min

Ceph OSD max PGs

The maximum amount of Placement Groups on OSDs.

Dependent item ceph.osd_pgs.max

Preprocessing

  • JSON Path: $.osd_pgs.max

Ceph OSD min PGs

The minimum amount of Placement Groups on OSDs.

Dependent item ceph.osd_pgs.min

Preprocessing

  • JSON Path: $.osd_pgs.min

Ceph OSD avg PGs

The average amount of Placement Groups on OSDs.

Dependent item ceph.osd_pgs.avg

Preprocessing

  • JSON Path: $.osd_pgs.avg

Ceph OSD Apply latency Avg

The average apply latency of OSDs.

Dependent item ceph.osd_latency_apply.avg

Preprocessing

  • JSON Path: $.osd_latency_apply.avg

Ceph OSD Apply latency Max

The maximum apply latency of OSDs.

Dependent item ceph.osd_latency_apply.max

Preprocessing

  • JSON Path: $.osd_latency_apply.max

Ceph OSD Apply latency Min

The minimum apply latency of OSDs.

Dependent item ceph.osd_latency_apply.min

Preprocessing

  • JSON Path: $.osd_latency_apply.min

Ceph OSD Commit latency Avg

The average commit latency of OSDs.

Dependent item ceph.osd_latency_commit.avg

Preprocessing

  • JSON Path: $.osd_latency_commit.avg

Ceph OSD Commit latency Max

The maximum commit latency of OSDs.

Dependent item ceph.osd_latency_commit.max

Preprocessing

  • JSON Path: $.osd_latency_commit.max

Ceph OSD Commit latency Min

The minimum commit latency of OSDs.

Dependent item ceph.osd_latency_commit.min

Preprocessing

  • JSON Path: $.osd_latency_commit.min

Ceph backfill full ratio

The backfill full ratio setting of the Ceph cluster as configured on OSDMap.

Dependent item ceph.osd_backfillfull_ratio

Preprocessing

  • JSON Path: $.osd_backfillfull_ratio

  • Discard unchanged with heartbeat: 10m

Ceph full ratio

The full ratio setting of the Ceph cluster as configured on OSDMap.

Dependent item ceph.osd_full_ratio

Preprocessing

  • JSON Path: $.osd_full_ratio

  • Discard unchanged with heartbeat: 10m

Ceph nearfull ratio

The near full ratio setting of the Ceph cluster as configured on OSDMap.

Dependent item ceph.osd_nearfull_ratio

Preprocessing

  • JSON Path: $.osd_nearfull_ratio

  • Discard unchanged with heartbeat: 10m

Triggers

Name Description Expression Severity Dependencies and additional info
Ceph: Can not connect to cluster

The connection to the Ceph RESTful module is broken (if there is any error presented including AUTH and the configuration issues).

last(/Ceph by Zabbix agent 2/ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"])=0 Average
Ceph: Cluster in ERROR state last(/Ceph by Zabbix agent 2/ceph.overall_status)=2 Average Manual close: Yes
Ceph: Cluster in WARNING state last(/Ceph by Zabbix agent 2/ceph.overall_status)=1 Warning Manual close: Yes
Depends on:
  • Ceph: Cluster in ERROR state
Ceph: Minimum monitor release version has changed

A Ceph version has changed. Acknowledge to close the problem manually.

last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#1)<>last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#2) and length(last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name))>0 Info Manual close: Yes

LLD rule OSD

Name Description Type Key and additional info
OSD Zabbix agent ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for OSD

Name Description Type Key and additional info
[osd.{#OSDNAME}] OSD in Dependent item ceph.osd[{#OSDNAME},in]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.in

  • Discard unchanged with heartbeat: 10m

[osd.{#OSDNAME}] OSD up Dependent item ceph.osd[{#OSDNAME},up]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.up

  • Discard unchanged with heartbeat: 10m

[osd.{#OSDNAME}] OSD PGs Dependent item ceph.osd[{#OSDNAME},num_pgs]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.num_pgs

    ⛔️Custom on fail: Discard value

[osd.{#OSDNAME}] OSD fill Dependent item ceph.osd[{#OSDNAME},fill]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.osd_fill

    ⛔️Custom on fail: Discard value

[osd.{#OSDNAME}] OSD latency apply

The time taken to flush an update to disks.

Dependent item ceph.osd[{#OSDNAME},latency_apply]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.osd_latency_apply

    ⛔️Custom on fail: Discard value

[osd.{#OSDNAME}] OSD latency commit

The time taken to commit an operation to the journal.

Dependent item ceph.osd[{#OSDNAME},latency_commit]

Preprocessing

  • JSON Path: $.osds.{#OSDNAME}.osd_latency_commit

    ⛔️Custom on fail: Discard value

Trigger prototypes for OSD

Name Description Expression Severity Dependencies and additional info
Ceph: OSD osd.{#OSDNAME} is down

OSD osd.{#OSDNAME} is marked "down" in the osdmap.
The OSD daemon may have been stopped, or peer OSDs may be unable to reach the OSD over the network.

last(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},up]) = 0 Average
Ceph: OSD osd.{#OSDNAME} is full min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_full_ratio)*100 Average
Ceph: Ceph OSD osd.{#OSDNAME} is near full min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_nearfull_ratio)*100 Warning Depends on:
  • Ceph: OSD osd.{#OSDNAME} is full

LLD rule Pool

Name Description Type Key and additional info
Pool Zabbix agent ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]

Item prototypes for Pool

Name Description Type Key and additional info
[{#POOLNAME}] Pool Used

The total bytes used in a pool.

Dependent item ceph.pool["{#POOLNAME}",bytes_used]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].bytes_used

[{#POOLNAME}] Max available

The maximum available space in the given pool.

Dependent item ceph.pool["{#POOLNAME}",max_avail]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].max_avail

[{#POOLNAME}] Pool RAW Used

Bytes used in pool including the copies made.

Dependent item ceph.pool["{#POOLNAME}",stored_raw]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].stored_raw

[{#POOLNAME}] Pool Percent Used

The percentage of the storage used per pool.

Dependent item ceph.pool["{#POOLNAME}",percent_used]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].percent_used

[{#POOLNAME}] Pool objects

The number of objects in the pool.

Dependent item ceph.pool["{#POOLNAME}",objects]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].objects

[{#POOLNAME}] Pool Read bandwidth

The read rate per pool (bytes per second).

Dependent item ceph.pool["{#POOLNAME}",rd_bytes.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].rd_bytes

  • Change per second
[{#POOLNAME}] Pool Write bandwidth

The write rate per pool (bytes per second).

Dependent item ceph.pool["{#POOLNAME}",wr_bytes.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].wr_bytes

  • Change per second
[{#POOLNAME}] Pool Read operations

The read rate per pool (operations per second).

Dependent item ceph.pool["{#POOLNAME}",rd_ops.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].rd_ops

  • Change per second
[{#POOLNAME}] Pool Write operations

The write rate per pool (operations per second).

Dependent item ceph.pool["{#POOLNAME}",wr_ops.rate]

Preprocessing

  • JSON Path: $.pools["{#POOLNAME}"].wr_ops

  • Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

Articles and documentation

+ Propose new article

Não encontrou a integração que vocá precisa?