Amazon OpenSearch Service Monitoring Integration
Amazon OpenSearch Service (previously, Amazon Elasticsearch Service) makes it easy to deploy and operate OpenSearch for log analytics, data search and more. By monitoring Amazon OpenSearch Service with Site24x7 you can oversee the operational aspects like performance optimization.
Table of contents
- Setup and configuration
- Policies and permissions
- Polling frequency
- Threshold configuration
- Supported metrics
- EBS volume metrics
- Dedicated master node metrics
- Instance metrics
- Ultra warm metrics
- Forecast
- OpenSearch monitoring interface
Setup and configuration
- If you haven't done it already, enable access to your AWS resource by creating Site24x7 as an IAM user or by creating a cross-account IAM role between your account and Site24x7's AWS account. Learn more.
- Next, In the Integrate AWS Account page, please make sure the OpenSearch checkbox is selected in the Services to be discovered field. Learn more.
Policies and permissions
Please make sure the following read level actions are present in the IAM policy assigned to Site24x7 entity. Learn more.
- "es:DescribeElasticsearchDomain",
- "es:ListDomainNames",
- "es:ListTags",
- "logs:DescribeLogStreams",
- "logs:GetLogEvents",
- "es:DescribePackages"
Polling frequency
Site24x7 queries the AWS service level APIs and CloudWatch APIs as per the poll frequency set (1 minute to a day), to collect performance metrics. Learn more.
Threshold configuration
Go to Admin > Configuration Profiles > Threshold and Availability (+) > choose the monitor type. You can set threshold values for all the applicable metrics. Further, you can choose to mute inactive alerts in the threshold form for OpenSearch nodes.
Supported metrics
Attribute | Description | Unit | Statistic |
Cluster Status |
Green - Indicates that all index shards are allocated to nodes in the cluster. Yellow- Indicates that the primary shards for all indices are allocated to nodes in a cluster, but the replica shards for at least one index are not. Red- Indicates that the primary and replica shards of at least one index are not allocated to nodes in a cluster. |
State | Minimum |
CPU Utilization | The percentage of CPU resources used for data nodes in the cluster.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Percentage | Average |
Storage | The free space and used space in GB, for nodes in the cluster. | GB | Sum, Maximum |
Nodes | The number of nodes in the Amazon OpenSearch cluster, including dedicated master nodes. | Count | Minimum |
Documents |
Searcable documents- The total number of searchable documents across all indices in the cluster. Editable documents - The total number of documents marked for deletion across all indices in the cluster and do not appear in the search results. |
Count | Maximum |
Cluster Index Writes Blocked |
Cluster block or accepts incoming requests. 0 - cluster is accepting requests, 1 - cluster is blocking requests. |
State | Maximum |
JVM Memory Pressure | The percentage of the Java heap used for all data nodes in the cluster. | Percentage | Maximum |
Automated snapshot failure | The number of failed automated snapshots for the cluster. | Count | Maximum |
CPU Credit Balance | The remaining CPU credits available for data nodes in the cluster. | Count | Minimum |
OpenSearchDashboardsHealthyNodes (previously KibanaHealthyNodes) | A health check for Kibana.
1- normal behavior, 0- Kibana is inaccessible. |
State | Minimum |
KMS Key Error | KMS customer master key used to encrypt data at rest has been disabled. | State | Maximum |
KMS Key Inaccessible | KMS customer master key used to encrypt data at rest has been deleted or revoked its grants to Amazon ES. | State | Maximum |
Invalid Host Header Requests | The number of HTTP requests made to the OpenSearch cluster that included an invalid (or missing) host header. | Count | Sum |
Elastcisearch Requests | The number of requests made to the OpenSearch cluster. | Count | Sum |
Request Count | The number of requests to a domain and the HTTP response code (2xx, 3xx, 4xx, 5xx) for each request. | Count | Sum |
EBS volume metrics
Attribute | Description | Unit | Statistic |
Read Latency | The latency, in seconds, for read operations on EBS volumes. | Count/sec | Average |
Write Latency | The latency, in seconds, for write operations on EBS volumes. | Count/sec | Average |
Read Throughput | The throughput, in bytes per second, for read operations on EBS volumes. | MB/sec | Average |
Write Throughput | The throughput, in bytes per second, for write operations on EBS volumes. | MB/sec | Average |
Disk Queue Depth | The number of pending input and output (I/O) requests for an EBS volume. | Count | Maximum |
Read IOPS | The number of input and output (I/O) operations per second for read operations on EBS volumes. | Count/sec | Average |
Write IOPS | The number of input and output (I/O) operations per second for write operations on EBS volumes. | Count/sec | Average |
Dedicated master node metrics
Attribute | Description | Unit | Statistic |
Master CPU Utilization | The maximum percentage of CPU resources used by the dedicated master nodes. | Percentage | Average |
Master Free Storage Space | Free storage space for master node.
Applicable as an OpenSearch node metric. |
MB | Average |
Master JVM Memory Pressure | The maximum percentage of the Java heap used for all dedicated master nodes in the cluster. | Percentage | Maximum |
Master CPU Credit Balance | The CPU credits available for dedicated master nodes in the cluster. | Count | Minimum |
Master Reachable From Node | A health check for MasterNotDiscovered exceptions. A value of 1 indicates normal behavior. A value of 0 indicates that cluster health is failing. | Count | Sum |
Master Sys Memory Utilization | The percentage of the master node's memory that is in use. | Percentage | Maximum |
Instance metrics
Attribute | Description | Unit | Statistic |
Indexing Latency | The average time, in milliseconds, that it takes a shard to complete an indexing operation.
Applicable as an OpenSearch node metric. |
Milliseconds | Average |
Indexing Rate | The number of indexing operations per minute. A single call to the _bulk API that adds two documents and updates two counts as four operations, which might be spread across one or more nodes. If that index has one or more replicas, other nodes in the cluster also record a total of four indexing operations. Document deletions do not count towards this metric.
Applicable as an OpenSearch node metric. |
Ops/min | Average |
Search Latency | The average time, in milliseconds, that it takes a shard on a data node to complete a search operation.
Applicable as an OpenSearch node metric. |
Milliseconds | Average |
Search Rate | The total number of search requests per minute for all shards on a data node. A single call to the _search API might return results from many different shards. If five of these shards are on one node, the node would report 5 for this metric, even though the client only made one request.
Applicable as an OpenSearch node metric. |
Ops/min | Average |
Sys Memory Utilization | The percentage of the instance's memory that is in use.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Percentage | Maximum |
JVMGC Young Collection Count | The number of times that "young generation" garbage collection has run. A large, ever-growing number of runs is a normal part of cluster operations.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
JVMGC Young Collection Time | The amount of time, in milliseconds, that the cluster has spent performing "young generation" garbage collection.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Milliseconds | Average |
JVMGC Old Collection Count | The number of times that "old generation" garbage collection has run. In a cluster with sufficient resources, this number should remain small and grow infrequently.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
JVMGC Old Collection Time | The amount of time, in milliseconds, that the cluster has spent performing "old generation" garbage collection.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Millisecond | Average |
Threadpool Force_merge Queue | The number of queued tasks in the force merge thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Force_merge Rejected | The number of rejected tasks in the force merge thread pool. If this number continually grows, consider scaling your cluster..
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Force_merge Threads | The size of the force merge thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Average |
Threadpool Index Queue | The number of queued tasks in the index thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Index Rejected | The number of rejected tasks in the index thread pool. If this number continually grows, consider scaling your cluster.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Index Threads | The number of queued tasks in the search thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Search Queue | The number of queued tasks in the search thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Search Rejected | The number of rejected tasks in the search thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Search Threads | The number of rejected tasks in the search thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Average |
Threadpool Bulk Queue | The number of queued tasks in the bulk thread pool. If the queue size is consistently high, consider scaling your cluster.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Bulk Rejected | The number of rejected tasks in the bulk thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Bulk Threads | The number of rejected tasks in the search thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Average |
Threadpool Write Threads | The size of the write thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Average |
Threadpool Write Rejected | The number of rejected tasks in the write thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Threadpool Write Queue | The number of queued tasks in the write thread pool.
Applicable as an OpenSearch node metric with a relevant statistic as Maximum. |
Count | Sum |
Ultra warm metrics
Attribute | Description | Unit | Statistic |
Warm CPU Utilization | The percentage of CPU usage for UltraWarm nodes in the cluster. | Percentage | Average |
Warm Free Storage Space | The amount of free warm storage space in MB. | MB | Average |
Warm JVM Memory Pressure | The maximum percentage of the Java heap used for the UltraWarm nodes. | Percentage | Max |
Warm Searchable Documents | The total number of searchable documents across all warm indices in the cluster. | Count | Sum |
Warm Search Latency | The average time, in milliseconds, that it takes a shard on an UltraWarm node to complete a search operation. | Milliseconds | Average |
Warm Search Rate | The total number of search requests per minute for all shards on an UltraWarm node. A single call to the _search API might return results from many different shards. | Ops/min | Average |
Warm Storage Space Utilization | The total amount of warm storage space that the cluster is using. | MB | Maximum |
Hot Storage Space Utilization | The total amount of hot storage space that the cluster is using. | MB | Maximum |
Warm Sys Memory Utilization | The percentage of the warm node's memory that is in use. | Percentage | Maximum |
Hot To Warm Migration Queue Size | The number of indices currently waiting to migrate from hot to warm storage. | Count | Maximum |
Warm To Hot Migration Queue Size | The number of indices currently waiting to migrate from warm to hot storage. | Count | Maximum |
Hot To Warm Migration Failure Count | The total number of failed hot to warm migrations. | Count | Sum |
Hot To Warm Migration Success Count | The total number of successful hot to warm migrations. | Count | Sum |
Forecast
Estimate future values of the following OpenSearch Domain performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.
- Deleted Documents
- CPU Utilization
- Free Storage Usage
- Cluster Used Space
- CPU Credit Balance
- Elastisearch Requests
- OpenSearch Requests
- Disk Queue Depth
- Read IOPS
- JVMGC Old Collection Time
- JVMGC Old Collection Count
- Sys Memory Utilization
Similarly, you can also view the forecast for the following metrics of OpenSearch Domain Node:
- CPU Utilization
- Free Storage Space
- Cluster Used Space
- Search Rate
- Sys Memory Utilization
- JVMGC Old Collection Time
- JVMGC Old Collection Count
OpenSearch monitoring interface
Summary
View the performance metrics of the OpenSearch service displayed as time series charts.
Volume details
Detailed graphs of EBS volumes metrics such as Read/Write IOPS, Read/Write latency and Read/Write throughput.