Amazon DocumentDB Monitoring Integration
Amazon DocumentDB is a document database service compatible with MongoDB workloads for managing JSON data at scale. With Site24x7's integration, you can monitor the health and performance of Amazon DocumentDB's cluster and instances.
- Setup
- Permissions
- Poll frequency
- Licensing
- Supported metrics
- Threshold configuration
- Site24x7's DocumentDB monitoring interface
Setup
- Please provide Site24x7 access to your AWS account by either creating an IAM User or IAM Role. Learn more.
- On the Integrate AWS Account page, please make sure the DocumentDB checkbox is selected in the Services to be Discovered field. Learn more.
Permissions
Please make sure the following read-level actions are present in the IAM policy assigned to the IAM User or IAM Role created for Site24x7. Learn more.
- "rds:DescribeDBClusters",
- "rds:DescribeDBInstances",
- "rds:ListTagsForResource",
- "rds:DescribeCertificates",
- "rds:DescribeEvents",
- "rds:DescribeGlobalClusters",
- "logs:DescribeLogStreams",
- "logs:GetLogEvents",
- "logs:GetLogEvents",
Poll Frequency
Aggregated DocumentDB metric data is collected as per the poll frequency set (1 minute to a day). Learn more.
Licensing
- Each DocumentDB monitor is considered a basic monitor.
Supported Metrics
DocumentDB Cluster and Instance Metrics
Attribute | Description | Statistics | Unit |
---|---|---|---|
Backup Retention Period Storage Used | The total amount of backup storage in GiB used to support the point-in-time restore feature within the Amazon DocumentDB's retention window. | Maximum | GB, Bytes |
Change Stream Log Size | The amount of storage used by your cluster to store the change stream log in megabytes. | Average | MB |
CPU Utilization | The percentage of CPU used by an Cluster | Maximum | Percent |
Database Connections | The number of connections open on an cluster taken at a one-minute frequency. | Average, Sum, Maximum | Count |
Database Connections Max | The maximum number of open database connections on an cluster in a one-minute period. | Average, Sum, Maximum | Count |
Database Cursors | The number of cursors open on an cluster taken at a one-minute frequency. | Average, Sum, Maximum | Count |
Database Cursors Max | The maximum number of open cursors on an cluster in a one-minute period. | Average, Sum, Maximum | Count |
Database Cursors Timed Out | The number of cursors that timed out in a one-minute period. | Sum | Count |
Freeable Memory | The amount of available random access memory. | Average | Bytes |
Free Local Storage | This metric reports the amount of storage available to each instance for temporary tables and logs. | Average | MB |
Low Memory Throttle Queue Depth | The queue depth for requests that are throttled due to low available memory | Sum | Count |
Low Memory Throttle Max Queue Depth | The maximum queue depth for requests that are throttled due to low available memory | Sum | Count |
Low Memory Number Operations Throttled | The number of requests that are throttled due to low available memory | Sum | Count |
Snapshot Storage Used | The total amount of backup storage in GiB consumed by all snapshots for a given Amazon DocumentDB cluster outside its backup retention window | Average | GB, Bytes |
Total Backup Storage Billed | The total amount of backup storage in GiB for which you are billed for a given Amazon DocumentDB cluster | Maximum | GB, Bytes |
Transactions Open | The number of transactions open on an instance | Average, Sum, Maximum | Count |
Transactions Open Max | The maximum number of transactions open on an instance | Average, Sum, Maximum | Count |
Volume Bytes Used | The amount of storage used by your cluster in bytes | Average | MB |
DB Cluster Replica Lag Maximum | The maximum amount of lag, in milliseconds, between the primary instance and each Amazon DocumentDB instance in the cluster | Maximum | ms |
DB Cluster Replica Lag Minimum | The minimum amount of lag, in milliseconds, between the primary instance and each replica instance in the cluster. | Minimum | ms |
DB Instance Replica Lag | The amount of lag, in milliseconds, when replicating updates from the primary instance to a replica instance. | Average | ms |
Read Latency | The average amount of time taken per disk I/O operation. | Average | ms |
Write Latency | The average amount of time, in milliseconds, taken per disk I/O operation. | Average | ms |
Low Memory Number Operations Timed Out | Number of operations timed out due to low available memory | Sum | Count |
Documents Deleted | The number of deleted documents | Sum | Count |
Documents Inserted | The number of inserted documents | Sum | Count |
Documents Returned | The number of returned documents | Sum | Count |
Documents Updated | The number of updated documents | Sum | Count |
Opcounters Command | The number of commands | Sum | Count |
Opcounters Delete | The number of delete operations | Sum | Count |
Opcounters Getmore | The number of getmores | Sum | Count |
Opcounters Insert | The number of insert operations | Sum | Count |
Opcounters Query | The number of queries issued | Sum | Count |
Opcounters Update | The number of update operations issued | Sum | Count |
Transactions Started | The number of transactions started | Sum | Count |
Transactions Committed | The number of transactions committed | Sum | Count |
Transactions Aborted | The number of transactions aborted | Sum | Count |
TTL Deleted Documents | The number of documents deleted | Sum | Count |
Network Receive Throughput | The amount of network throughput, in bytes per second, received from clients by each instance in the cluster | Average | mb/sec |
Network Throughput | The amount of network throughput, in bytes per second, both received from and transmitted to clients by each instance in the Amazon DocumentDB cluster. | Average | mb/sec |
Network Transmit Throughput | The amount of network throughput, in bytes per second, sent to clients by each instance in the cluster. | Average | mb/sec |
Read IOPS | The average number of disk read I/O operations per second. | Average | Count |
Write IOPS | The average number of disk write I/O operations per second. | Average | Count |
Read Throughput | The average number of bytes read from disk per second. | Average | Bytes/sec |
Write Throughput | The average number of bytes write to disk per second. | Average | Bytes/sec |
Volume Read IOPs | The average number of billed read I/O operations from a cluster volume | Average | Count |
Volume Write IOPs | The average number of billed write I/O operations from a cluster volume | Average | Count |
Buffer Cache Hit Ratio | The percentage of requests that are served by the buffer cache. | Average | Percent |
Disk Queue Depth | The number of concurrent write requests to the distributed storage volume. | Sum | Count |
Engine Uptime | The amount of time, in seconds, that the instance has been running. | Average | Seconds |
Index Buffer Cache Hit Ratio | The percentage of index requests that are served by the buffer cache. | Average | Percent |
CPU Credit Usage | The number of CPU credits spent during the measurement period. | Average | Count |
CPU Credit Balance | The number of CPU credits that an instance has accrued. | Average | Count |
CPU Surplus Credit Balance | The number of surplus CPU credits spent to sustain CPU performance when the CPUCreditBalance value is zero. | Average | Count |
CPU Surplus Credits Charged | The number of surplus CPU credits exceeding the maximum number of CPU credits that can be earned in a 24-hour period, and thus attracting an additional charge. | Average | Count |
Swap Usage | The amount of swap space used on the instance. | Average | Bytes |
DocumentDB Global Cluster Metrics
Attribute | Description | Statistics | Unit |
---|---|---|---|
Global Cluster Replicated Write IO | The average number of billed write I/O operations replicated from the cluster volume in the primary AWS Region to the cluster volume in a secondary AWS Region | Average | Count |
GlobalClusterDataTransferBytes | The amount of data transferred from the primary cluster’s AWS Region to a secondary cluster’s AWS Region | Average | MB |
GlobalClusterReplicationLag | The amount of lag, in milliseconds, when replicating change events from the primary cluster’s AWS Region to a secondary cluster’s AWS Region | Average | ms |
To View Data
- Sign in to the Site24x7 console. Click on AWS. Choose the monitored AWS account.
- Choose DocumentDB from the menu dropdown.
- From the list of monitored resources, choose the DocumentDB resource for which you want to view metrics for.
Threshold Configuration
Set thresholds for the various performance metrics related to DocumentDB and get alerts when they exceed the configured values.
- Go to Admin > Configuration Profiles > Threshold and Availability > (+). You can also navigate via Cloud > AWS > click on the AWS account > DocumentDB Cluster/DocumentDB Instance/DocumentDB Global Clusters > hover on the hamburger icon beside the display name > Edit > Threshold and Availability > click on the pencil icon.
- In the Add Threshold and Availability form, select DocumentDB Cluster, DocumentDB Global Clusters, or DocumentDB Instance.
- Set threshold values for the required metrics.
- Save your changes.
Site24x7's DocumentDB Monitoring Interface
Summary
This section provides you with operational details like CPU utilization, database connections, database connection max, database cursors, database cursors max, freeable memory, buffer cache hit ratio, number of operations timed out due to low memory, snapshot and backup storage, and many more metrics.
Configuration Details
Get details including the cluster ID, status, availability zone, region, backup retention period, engine name and its version, master username, port, subnet group details, and other configuration details.
Monitored Resources
Various resource availability statuses are provided here, with information on associated DocumentDB cluster and instances, resource name, type, display name, status, and action. The Action column allows you to set alerts and add automations for when a monitored resource is marked as Down, Critical, or Trouble.
Audit Logs and Profiler Logs
View audit events and profiler events to monitor the execution time and details of operations performed on your cluster. These logs prove helpful to identify slow operations on the cluster and improve individual query performance and overall cluster performance.
Cluster Events
View events related to your clusters, instances, snapshots, security groups, and cluster parameter groups. Get details including the date and time of the event, source name and source type of the event, and a message that is associated with the event. This tab is available only for DocumentDB Cluster and DocumentDB Instance monitors.
Outages
A history of your resources’ various states, like down, trouble, critical, or maintenance, is displayed in the Outages tab. Details on the start time and end time of an outage, duration, and comments (if any) are provided in this section. You can also edit or delete comments.
Log Report
Here you can view the audit log data for DocumentDB clusters and DocumentDB instances, along with details on the timestamp, status, CPU utilization, database connections sum, and database cursors sum.