AWS Database Migration Service Monitoring Integration
AWS Database Migration Service (DMS) is a service designed to migrate data from one database to another. It supports both homogeneous migrations, such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle or Microsoft SQL Server to Amazon Aurora.
With Site24x7's integration with AWS DMS, you can monitor database endpoints at source and target, and ensure a seamless data migration. We help you address database workload challenges during migration by keeping a close watch on your AWS DMS replication tasks and replication instances.
Setup and configuration
1. If you haven't already, enable access to your AWS resources in your AWS account and Site24x7's AWS account by either:
- Creating Site24x7 as an IAM user.
- Creating a cross-account IAM role. Learn more
2. On the Integrate AWS Account page, check the appropriate box for DMS Replication Task and DMS Replication Instance. Learn more
Policy and permissions
Site24x7 uses various AWS DMS APIs to collect information about your migration service. Assign the AWS managed policy ReadOnlyAccess to the Site24x7 entity (IAM user or IAM role) to help Site24x7 collect metrics and metadata. If you want to assign a custom policy, please make sure the following read-level actions are present in the policy JSON. Learn more
- "dms:DescribeAccountAttributes",
- "dms:DescribeReplicationInstances",
- "dms:DescribeReplicationTasks",
- "dms:DescribeTableStatistics",
- "dms:DescribeCertificates",
- "dms:DescribeConnections",
- "dms:DescribeEndpoints",
- "dms:ListTagsForResource",
- "dms:DescribeEvents",
- "logs:DescribeLogStreams",
- "logs:GetLogEvents"
Polling Frequency
Site24x7 queries AWS to collect AWS DMS performance metrics according to the configured polling frequency. The polling interval is one hour by default. Learn more
IT Automations
You can add automations for the AWS services supported by Site24x7. Log in to Site24x7 and go to Admin > IT Automation Templates (+) > Add Automation Templates. Once automations are added, you can schedule them to be executed one after the other.
You can now start, stop, resume, and reload AWS DMS replication tasks automatically using the AWS Data Migration Service automations.
Performance metrics for AWS DMS replication tasks
Attribute | Description | Statistic | Data type |
---|---|---|---|
Full Load Throughput Bandwidth Source | Incoming data received from a full load from the source, measured in kilobytes per second. | Average | KB/sec |
Full Load Throughput Bandwidth Target | Outgoing data transmitted from a full load for the target, measured in kilobytes per second. | Average | KB/sec |
Full Load Throughput Rows Source | Incoming changes from a full load from the source, measured in rows per second. | Average | Count/sec |
Full Load Throughput Rows Target | Outgoing changes from a full load for the target, measured in rows per second. | Average | Count/sec |
CDC Incoming Changes | The total number of change events at a point in time that are waiting to be applied to the target. Note that this is not the same as a measure of the transaction change rate of the source endpoint. A large number for this metric usually indicates AWS DMS is unable to apply captured changes in a timely manner, thus causing high target latency. | Sum | Count |
CDC Changes Memory Source | The amount of rows accumulated in memory and waiting to be committed from the source. You can view this metric together with CDCChangesDiskSource. | Sum | Count |
CDC Changes Memory Target | The amount of rows accumulated in memory and waiting to be committed to the target. You can view this metric together with CDCChangesDiskTarget. | Sum | Count |
CDC Changes Disk Source | The amount of rows accumulated on the disk and waiting to be committed from the source. You can view this metric together with CDCChangesMemorySource. | Sum | Count |
CDC Changes Disk Target | The amount of rows accumulated on the disk and waiting to be committed to the target. You can view this metric together with CDCChangesMemoryTarget. | Sum | Count |
CDC Throughput Bandwidth Source | Incoming data received for the source, measured in kilobytes per second. CDCThroughputBandwidth records incoming data received on sampling points. If no task network traffic is found, the value is zero. Because CDC does not issue long-running transactions, network traffic may not be recorded. | Average | KB/sec |
CDC Throughput Bandwidth Target | Outgoing data transmitted for the target, measured in kilobytes per second. CDCThroughputBandwidth records outgoing data transmitted on sampling points. If no task network traffic is found, the value is zero. Because CDC does not issue long-running transactions, network traffic may not be recorded. | Average | KB/sec |
CDC Throughput Rows Source | Incoming task changes from the source, measured in rows per second. | Average | Count/sec |
CDC Throughput Rows Target | Outgoing task changes for the target, measured in rows per second. | Average | Count/sec |
CDC Latency Source | The gap, in seconds, between the last event captured from the source endpoint and current system time stamp of the AWS DMS instance. CDCLatencySource represents the latency between source and replication instance. High CDCLatencySource means the process of capturing changes from source is delayed. To identify latency in an ongoing replication, you can view this metric together with CDCLatencyTarget. If both CDCLatencySource and CDCLatencyTarget are high, investigate CDCLatencySource first. | Average | Seconds |
CDC Latency Target | CDC Latency Target represents the latency between replication instance and target. When CDC Latency Target is high, it indicates the process of applying change events to the target is delayed. | Average | Seconds |
CPU Utilization | The percent of CPU being used by a task. | Average | Percent |
CPU Allocated | The percent of CPU maximally allocated for the task (0 means no limit). | Average | Percent |
Memory Allocated | The maximum allocation of memory for the task (0 means no limit). | Average | MB |
Swap Usage | The amount of swap used by the task. | Average | Bytes |
Validation Succeeded Record Count | The number of rows that AWS DMS validated per minute. | Sum | Count |
Validation Attempted Record Count | The number of rows where validation was attempted per minute. | Sum | Count |
Validation Failed Overall Count | The number of rows where validation failed. | Sum | Count |
Validation Suspended Overall Count | The number of rows where validation was suspended. | Sum | Count |
Validation Pending Overall Count | The number of rows where validation is still pending. | Sum | Count |
Validation Bulk Query Source Latency | AWS DMS can do data validation in bulk, especially in certain scenarios during a full-load or ongoing replication when there are many changes. This metric indicates the latency required to read a bulk set of data from the source endpoint. | Average | Milliseconds |
Validation Bulk Query Target Latency | AWS DMS can do data validation in bulk, especially in certain scenarios during a full-load or ongoing replication when there are many changes. This metric indicates the latency required to read a bulk set of data on the target endpoint. | Average | Milliseconds |
Validation Item Query Source Latency | During ongoing replication, data validation can identify ongoing changes and validate them. This metric indicates the latency in reading those changes from the source. Validation can run more queries than required, based on the number of changes, if there are errors during validation. | Average | Milliseconds |
Validation Item Query Target Latency | During ongoing replication, data validation can identify ongoing changes and validate them row by row. This metric provides the latency in reading those changes from the target. Validation may run more queries than required, based on the number of changes, if there are errors during validation. | Average | Milliseconds |
Full Load Throughput Bandwidth Total | The total full load throughput bandwidth at Target and Source. | Average | KB/sec |
Full Load Throughput Rows Total | The total full load throughput rows at Target and Source. | Average | Count/sec |
CDC Changes Memory Total | The total number of CDC Changes in memory at Target and Source. | Sum | Count |
CDC Changes Disk Total | The total number of CDC Changes in disk at Target and Source. | Sum | Count |
CDC Throughput Bandwidth Total | The total CDC throughput bandwidth at Target and Source. | Average | Count/sec |
CDC Throughput Rows Total | The total CDC throughput bandwidth at Target and Source. | Average | Count/sec |
CDC Latency Total | The total CDC latency at Target and Source. | Average | Seconds |
Validation Bulk Query Total Latency | The total latency of validation bulk query at Target and Source. | Average | Milliseconds |
Validation Item Query Total Latency | The total latency of validation item query at Target and Source | Average | Milliseconds |
Performance metrics for AWS DMS replication instances
Attribute | Description | Statistic | Data type |
---|---|---|---|
CPU Utilization | The amount of CPU used. | Average | Percent |
Free Storage Space | The amount of available storage space. | Average | Bytes |
Freeable Memory | The amount of available random access memory. | Average | Bytes |
Write IOPS | The average number of disk write I/O operations per second. | Average | Count/sec |
Read IOPS | The average number of disk read I/O operations per second. | Average | Count/sec |
Read Throughput | The average number of bytes read from disk per second. | Average | Bytes/sec |
Read Latency | The average amount of time taken per disk I/O (input) operation. | Average | Milliseconds |
Swap Usage | The amount of swap space used on the replication instance. | Average | Bytes |
Network Receive Throughput | The incoming (Receive) network traffic on the replication instance, including both customer database traffic and AWS DMS traffic used for monitoring and replication. | Average | Bytes/sec |
Forecast
Estimate future values of the following Database Migration Service Instance performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.
- CPU Utilization
- Read IOPS
- Write IOPS
- Freeable Memory
- Swap Usage
- Disk Queue Depth
Similarly, you can also view the forecast for the following metrics of Database Migration Service tasks:
- CPU Utilization
- Memory Usage
Site24x7's AWS DMS monitoring interface
Summary
Gain an overview of the different events occurring within each replication task or replication instance with time series charts. This section provides you with operational details like CPU utilization, memory usage, full load bandwidth, full load throughput rows, change data capture (CDC) incoming changes, CDC changes in disk and memory, CDC latency, and many more metrics.
There is a separate Task Summary tab for replication instances, which displays task details and real-time statistics for individual tasks. For each task detail, you have the option to bulk edit the threshold profiles as well.
Monitored Resources
Various resource availability statuses are provided here, with information on resource name, type, display name, status, and action. The Action column allows you to set alerts and add automations for when a monitored resource is marked as Down, Critical, or Trouble.
Endpoint Details
The DMS Replication Task section provides you with the endpoint details of each task. This section has various details on connections, source endpoints, and target endpoints. The Connections section lets you configure thresholds, set alerts, and add automations for each endpoint when it is Down.
Outages
A history of your resources’ various states, like down, trouble, critical, or maintenance, is displayed in the Outages tab. Details on the start time and end time of an outage, duration, and comments (if any) are provided in this section. You can also edit or delete comments.
Log Report
Here you can view the audit log data for a replication instance or replication task, along with details on the timestamp, status, CPU utilization, free storage, and freeable memory.