AWS ECS

Amazon Elastic Compute Cloud is a part of Amazon.com's cloud-computing platform, Amazon Web Services, that allows users to rent virtual computers on which to run their own computer applications.

Available solutions




This template is for Zabbix version: 7.0
Also available for: 6.4 6.0

Source: https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/cloud/AWS/aws_ecs_http?at=release/7.0

AWS ECS Cluster by HTTP

Overview

The template to monitor AWS ECS Cluster by HTTP via Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. NOTE This template uses the GetMetricData CloudWatch API calls to list and retrieve metrics. For more information, please refer to the CloudWatch pricing page.

Additional information about the metrics and used API methods:

Requirements

Zabbix version: 7.0 and higher.

Tested versions

This template has been tested on:

  • AWS ECS Cluster by HTTP

Configuration

Zabbix should be configured according to the instructions in the Templates out of the box section.

Setup

The template gets AWS ECS metrics and uses the script item to make HTTP requests to the CloudWatch API.

Before using the template, you need to create an IAM policy for the Zabbix role in your AWS account with the necessary permissions.

Add the following required permissions to your Zabbix IAM policy in order to collect Amazon ECS metrics.

{
    "Version":"2012-10-17",
    "Statement":[
        {
          "Action":[
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices",
                "esc:ListTasks"
          ],
          "Effect":"Allow",
          "Resource":"*"
        }
    ]
  }

If you are using role-based authorization, set the appropriate permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "iam:PassRole",
            "Resource": "arn:aws:iam::<<--account-id-->>:role/<<--role_name-->>"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "cloudwatch:DescribeAlarms",
                "cloudwatch:GetMetricData",
                "ecs:ListServices",
                "esc:ListTasks",
                "ec2:AssociateIamInstanceProfile",
                "ec2:ReplaceIamInstanceProfileAssociation"
            ],
            "Resource": "*"
        }
    ]
}

Set the following macros "{$AWS.AUTH_TYPE}", "{$AWS.REGION}", "{$AWS.ECS.CLUSTER.NAME}"

If you are using access key-based authorization, set the following macros "{$AWS.ACCESS.KEY.ID}", "{$AWS.SECRET.ACCESS.KEY}"

For more information about managing access keys, see official documentation

Refer to the Macros section for a list of macros used for LLD filters.

Additional information about the metrics and used API methods:

Macros used

Name Description Default
{$AWS.PROXY}

Sets HTTP proxy value. If this macro is empty then no proxy is used.

{$AWS.ACCESS.KEY.ID}

Access key ID.

{$AWS.SECRET.ACCESS.KEY}

Secret access key.

{$AWS.REGION}

Amazon ECS Region code.

us-west-1
{$AWS.AUTH_TYPE}

Authorization method. Possible values: role_base, access_key.

access_key
{$AWS.ECS.CLUSTER.NAME}

ECS cluster name.

{$AWS.ECS.LLD.FILTER.ALARM_NAME.MATCHES}

Filter of discoverable alarms by name.

.*
{$AWS.ECS.LLD.FILTER.ALARM_NAME.NOT_MATCHES}

Filter to exclude discovered alarms by name.

CHANGE_IF_NEEDED
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.MATCHES}

Filter of discoverable alarms by namespace.

.*
{$AWS.ECS.LLD.FILTER.ALARM_SERVICE_NAMESPACE.NOT_MATCHES}

Filter to exclude discovered alarms by namespace.

CHANGE_IF_NEEDED
{$AWS.ECS.LLD.FILTER.SERVICE.MATCHES}

Filter of discoverable services by name.

.*
{$AWS.ECS.LLD.FILTER.SERVICE.NOT_MATCHES}

Filter to exclude discovered services by name.

CHANGE_IF_NEEDED
{$AWS.ECS.CLUSTER.CPU.UTIL.WARN}

The warning threshold of the cluster CPU utilization expressed in %.

70
{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN}

The warning threshold of the cluster memory utilization expressed in %.

70
{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN}

The warning threshold of the cluster service CPU utilization expressed in %.

80
{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN}

The warning threshold of the cluster service memory utilization expressed in %.

80

Items

Name Description Type Key and additional info
Get cluster metrics

Get cluster metrics.

Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html

Script aws.ecs.get_metrics

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

Get cluster services

Get cluster services.

Full metrics list related to ECS: https://docs.aws.amazon.com/AmazonECS/latest/userguide/metrics-dimensions.html

Script aws.ecs.get_cluster_services

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

Get alarms data

Get alarms data.

DescribeAlarms API method: https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_DescribeAlarms.html

Script aws.ecs.get_alarms

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

Get metrics check

Data collection check.

Dependent item aws.ecs.metrics.check

Preprocessing

  • JSON Path: $.error

    ⛔️Custom on fail: Set value to

  • Discard unchanged with heartbeat: 3h

Get alarms check

Data collection check.

Dependent item aws.ecs.alarms.check

Preprocessing

  • JSON Path: $.error

    ⛔️Custom on fail: Set value to

  • Discard unchanged with heartbeat: 3h

Container Instance Count

'The number of EC2 instances running the Amazon ECS agent that are registered with a cluster.'

Dependent item aws.ecs.container_instance_count

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

Task Count

'The number of tasks running in the cluster.'

Dependent item aws.ecs.task_count

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

Service Count

'The number of services in the cluster.'

Dependent item aws.ecs.service_count

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

CPU Reserved

'A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined CPU reservation in their task definition.'

Dependent item aws.ecs.cpu_reserved

Preprocessing

  • JSON Path: $.[?(@.Label == "CpuReserved")].Values.first().first()

    ⛔️Custom on fail: Discard value

CPU Utilization

Cluster CPU utilization

Dependent item aws.ecs.cpu_utilization

Preprocessing

  • JSON Path: $.CPUUtilization

    ⛔️Custom on fail: Discard value

Memory Utilization

'The memory being used by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined memory reservation in their task definition.'

Dependent item aws.ecs.memory_utilization

Preprocessing

  • JSON Path: $.MemoryUtilization

    ⛔️Custom on fail: Discard value

Network rx bytes

'The number of bytes received by the resource that is specified by the dimensions that you're using.

This metric is only available for containers in tasks using the awsvpc or bridge network modes.'

Dependent item aws.ecs.network.rx

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

Network tx bytes

'The number of bytes transmitted by the resource that is specified by the dimensions that you're using.

This metric is only available for containers in tasks using the awsvpc or bridge network modes.'

Dependent item aws.ecs.network.tx

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

Triggers

Name Description Expression Severity Dependencies and additional info
Failed to get metrics data length(last(/AWS ECS Cluster by HTTP/aws.ecs.metrics.check))>0 Warning
Failed to get alarms data length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarms.check))>0 Warning
High CPU utilization

The CPU utilization is too high. The system might be slow to respond.

min(/AWS ECS Cluster by HTTP/aws.ecs.cpu_utilization,15m)>{$AWS.ECS.CLUSTER.CPU.UTIL.WARN} Warning
High memory utilization

The system is running out of free memory.

min(/AWS ECS Cluster by HTTP/aws.ecs.memory_utilization,15m)>{$AWS.ECS.CLUSTER.MEMORY.UTIL.WARN} Warning

LLD rule Cluster Alarms discovery

Name Description Type Key and additional info
Cluster Alarms discovery

Discovery instance alarms.

Dependent item aws.ecs.alarms.discovery

Preprocessing

  • JavaScript: The text is too long. Please see the template.

  • Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Alarms discovery

Name Description Type Key and additional info
[{#ALARM_NAME}]: Get metrics

Get alarm metrics about the state and its reason.

Dependent item aws.ecs.alarm.get_metrics["{#ALARM_NAME}"]

Preprocessing

  • JSON Path: $.[?(@.AlarmName == "{#ALARM_NAME}")].first()

    ⛔️Custom on fail: Discard value

[{#ALARM_NAME}]: State reason

An explanation for the alarm state, in text format.

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item aws.ecs.alarm.state_reason["{#ALARM_NAME}"]

Preprocessing

  • JSON Path: $.StateReason

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

[{#ALARM_NAME}]: State

The state value for the alarm. Possible values: 0 (OK), 1 (INSUFFICIENT_DATA), 2 (ALARM).

Alarm description:

{#ALARM_DESCRIPTION}

Dependent item aws.ecs.alarm.state["{#ALARM_NAME}"]

Preprocessing

  • JSON Path: $.StateValue

    ⛔️Custom on fail: Set value to: 3

  • JavaScript: The text is too long. Please see the template.

Trigger prototypes for Cluster Alarms discovery

Name Description Expression Severity Dependencies and additional info
[{#ALARM_NAME}] has 'Alarm' state

Alarm "{#ALARM_NAME}" has 'Alarm' state.
Reason: {ITEM.LASTVALUE2}

last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=2 and length(last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state_reason["{#ALARM_NAME}"]))>0 Average
[{#ALARM_NAME}] has 'Insufficient data' state last(/AWS ECS Cluster by HTTP/aws.ecs.alarm.state["{#ALARM_NAME}"])=1 Info

LLD rule Cluster Services discovery

Name Description Type Key and additional info
Cluster Services discovery

Discovery {$AWS.ECS.CLUSTER.NAME} services.

Dependent item aws.ecs.services.discovery

Preprocessing

  • JSON Path: $.services

  • Discard unchanged with heartbeat: 3h

Item prototypes for Cluster Services discovery

Name Description Type Key and additional info
[{#AWS.ECS.SERVICE.NAME}]: Running Task

The number of tasks currently in the running state.

Dependent item aws.ecs.services.running.task["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

[{#AWS.ECS.SERVICE.NAME}]: Pending Task

The number of tasks currently in the pending state.

Dependent item aws.ecs.services.pending.task["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

[{#AWS.ECS.SERVICE.NAME}]: Desired Task

The desired number of tasks for an {#AWS.ECS.SERVICE.NAME} service.

Dependent item aws.ecs.services.desired.task["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

[{#AWS.ECS.SERVICE.NAME}]: Task Set

The number of task sets in the {#AWS.ECS.SERVICE.NAME} service.

Dependent item aws.ecs.services.task.set["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

  • Discard unchanged with heartbeat: 3h

[{#AWS.ECS.SERVICE.NAME}]: CPU Reserved

"A number of CPU units reserved by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined CPU reservation in their task definition."

Dependent item aws.ecs.services.cpu_reserved["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

[{#AWS.ECS.SERVICE.NAME}]: CPU Utilization

"A number of CPU units used by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined CPU reservation in their task definition."

Dependent item aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

[{#AWS.ECS.SERVICE.NAME}]: Memory utilized

'The memory being used by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined memory reservation in their task definition.'

Dependent item aws.ecs.services.memory_utilized["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

  • Custom multiplier: 1048576

[{#AWS.ECS.SERVICE.NAME}]: Memory utilization

'The memory being used by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined memory reservation in their task definition.'

Dependent item aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

[{#AWS.ECS.SERVICE.NAME}]: Memory reserved

'The memory that is reserved by tasks in the resource that is specified by the dimension set that you're using.

This metric is only collected for tasks that have a defined memory reservation in their task definition.'

Dependent item aws.ecs.services.memory_reserved["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

  • Custom multiplier: 1048576

[{#AWS.ECS.SERVICE.NAME}]: Network rx bytes

'The number of bytes received by the resource that is specified by the dimensions that you're using.

This metric is only available for containers in tasks using the awsvpc or bridge network modes.'

Dependent item aws.ecs.services.network.rx["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

[{#AWS.ECS.SERVICE.NAME}]: Network tx bytes

'The number of bytes transmitted by the resource that is specified by the dimensions that you're using.

This metric is only available for containers in tasks using the awsvpc or bridge network modes.'

Dependent item aws.ecs.services.network.tx["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • JSON Path: The text is too long. Please see the template.

    ⛔️Custom on fail: Discard value

[{#AWS.ECS.SERVICE.NAME}]: Get metrics

Get metrics of ESC services.

Full metrics list related to ECS : https://docs.aws.amazon.com/ecs/index.html

Script aws.ecs.services.get_metrics["{#AWS.ECS.SERVICE.NAME}"]

Preprocessing

  • Check for not supported value: any error

    ⛔️Custom on fail: Discard value

Trigger prototypes for Cluster Services discovery

Name Description Expression Severity Dependencies and additional info
[{#AWS.ECS.SERVICE.NAME}]: High CPU utilization

The CPU utilization is too high. The system might be slow to respond.

min(/AWS ECS Cluster by HTTP/aws.ecs.services.cpu.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.CPU.UTIL.WARN} Warning
[{#AWS.ECS.SERVICE.NAME}]: High memory utilization

The system is running out of free memory.

min(/AWS ECS Cluster by HTTP/aws.ecs.services.memory.utilization["{#AWS.ECS.SERVICE.NAME}"],15m)>{$AWS.ECS.CLUSTER.SERVICE.MEMORY.UTIL.WARN} Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

Articles and documentation

+ Propose new article

Didn't find what you are looking for?