Help Docs

Amazon Elastic Map Reduce (EMR) Monitoring

Amazon EMR is a web service that enables users to run Big Data frameworks to process large volumes of data. Site24x7 monitors EMR to ensure uninterrupted data analysis and notifies users about the status changes in the associated AWS services, such as EC2 instances in the EMR cluster.

Setup and configuration

  • If you haven't done it already, enable access to your AWS resource by creating Site24x7 as an IAM user or by creating a cross-account IAM role between your account and Site24x7's AWS account. Learn more.
  • Next, In the Integrate AWS Account page, please make sure the EMR checkbox is selected in the Services to be discovered field.  Learn more.

Policies and permissions

Please make sure the following read level actions are present in the IAM policy assigned to Site24x7 entity. Learn more.

  • "elasticmapreduce:ListSecurityConfigurations",
  • "elasticmapreduce:DescribeCluster",
  • "elasticmapreduce:ListClusters",
  • "elasticmapreduce:ListBootstrapActions",
  • "elasticmapreduce:ListSteps",
  • "elasticmapreduce:ListInstanceFleets",
  • "elasticmapreduce:ListInstanceGroups",
  • "elasticmapreduce:ListInstances"

Polling frequency

Site24x7 queries the AWS service level APIs and CloudWatch APIs as per the poll frequency set (1 minute to a day), to collect performance metrics. Learn more.

Supported Metrics 

Attribute Description Data type Statistic
Core Nodes Pending The number of core nodes waiting to be assigned. This metric is reported only if a core node exists. Count Maximum
Core Nodes Running The number of core nodes working. This metric is reported only if a core node exists. Count Maximum
Task Nodes Pending The number of task nodes waiting to be assigned. This metric is reported only if a Task Node exists. Count Maximum
Task Nodes Running The number of task nodes working. This metric is reported only if a Task Node exists. Count Maximum
Capacity Remaining The amount of remaining HDFS disk capacity. GB Minimum
Corrupt Blocks The number of blocks that HDFS reports as corrupted. Count Maximum
DFS Pending Replication Blocks The status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests. Count Maximum
HDFS Bytes Read The number of bytes read from HDFS. MB Sum
HDFS Bytes Written The number of bytes written to HDFS. MB Sum
HDFS Utilization The percentage of HDFS storage currently used. Percentage Average
Cluster Idle Status Indicates value as i when cluster is in idle state otherwise 0. Count Maximum
Live Data Nodes The percentage of data nodes that are receiving work from Hadoop. Percentage Average
Missing Blocks The number of blocks in which HDFS has no replicas. Count Maximum
Pending Deletion Blocks The number of blocks marked for deletion. Count Maximum
S3 Bytes Read The number of bytes read from Amazon S3. MB Sum
Live Task Trackers The percentage of task trackers that are functional. Percentage Average
Map Slots Open The unused map task capacity in Hadoop version 1. Count Maximum
Blacklisted Task Trackers The number of task trackers that are blacklisted in Hadoop version 1. Count Maximum
Graylisted Task Trackers The number of task trackers that are grey listed in Hadoop version 1.. Count Maximum
Reduce Slots Open Unused reduce task capacity in Hadoop version 1. Count Maximum
Remaining Map Tasks The number of remaining map tasks for each job in Hadoop version 1. Count Maximum
Remaining Map Tasks per Slot The ratio of the total map tasks remaining to the total map slots available in the cluster in Hadoop version 1. Count Maximum
Remaining Reduce Tasks The number of remaining reduce tasks for each job in Hadoop version1. Count Maximum
Running Map Tasks The number of running map tasks for each job in Hadoop version 1. Count Maximum
Running Reduce Tasks The number of running reduce tasks for each job in Hadoop version 1. Count Maximum
Apps Completed The number of applications submitted to YARN that have completed in Hadoop version 2. Count Maximum
Apps Failed The number of applications submitted to YARN that have failed to complete in Hadoop version 2. Count Maximum
Apps Killed The number of applications submitted to YARN that have been killed in Hadoop version 2. Count Maximum
Apps Pending The number of applications submitted to YARN that are in a pending state in Hadoop version 2. Count Maximum
Apps Running The number of applications submitted to YARN that are running in Hadoop version 2. Count Maximum
Apps Submitted The number of applications submitted to YARN in Hadoop version 2. Count Maximum
Container Allocated The number of resource containers allocated by the ResourceManager for Hadoop version 2. Count Maximum
Container Pending The number of containers in the queue that have not yet been allocated in Hadoop version 2. Count Maximum
Container Reserved The number of containers reserved in Hadoop version 2. Count Maximum
Memory Reserved The amount of memory reserved in Hadoop version 2. MB Maximum
Memory Allocated The amount of memory allocated to the cluster in Hadoop version 2. MB Maximum
Memory Available The amount of memory available to be allocated in Hadoop version 2. MB Minimum
Memory Total The total amount of memory in the cluster in Hadoop version 2. MB Maximum
MR Active Nodes The number of nodes presently running MapReduce tasks or jobs in Hadoop version 2. Count Minimum
MR Decommissioned Nodes The number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state in Hadoop version 2. Count Maximum
MR Lost Nodes The number of nodes allocated to MapReduce that have been marked in a LOST state in Hadoop version 2. Count Maximum
MR Rebooted Nodes The number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state in Hadoop version 2. Count Maximum
MR Total Nodes The number of nodes presently available to MapReduce jobs in Hadoop version 2. Count Maximum
MR Unhealthy Nodes The number of nodes available to MapReduce jobs marked in an UNHEALTHY state in Hadoop version 2. Count Maximum
Container Pending Ratio The ratio of pending containers to containers allocated in Hadoop version 2. Count Maximum
YARN Memory Available The percentage of remaining memory available to YARN in Hadoop version 2. Percentage Average
HBase Backup Failed Status of the previous backup. It is set to 1 if the backup attempt had failed. This metric is collected only if HBase is present. Count Maximum
Most Recent Backup The amount of time it took the previous backup to complete. This metric is collected only if HBase is present. Minutes Average
Time Since Last Successful Backup TThe number of elapsed minutes after the last successful HBase backup started on your cluster. This metric is collected only if HBase is present. Minutes Average
Multimaster Instancegroup Nodes Running The number of running master nodes.This metric is collected only with Hadoop version 2 and if MultiMaster exists. Count Maximum
Multimaster Instancegroup Nodes Running Percentage The percentage of master nodes that are running over the requested master node instance count. This metric is collected only with Hadoop version 2 and if MultiMaster exists. Percentage Average
Multimaster Instancegroup Nodes Requested The number of requested master nodes. This metric is collected only with Hadoop version 2 and if MultiMaster exists. Count Maximum

Forecast

Estimate future values of the following performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.

  • Capacity Remaining
  • HDFS Bytes Read
  • HDFS Bytes Written
  • HDFS Utilization
  • S3 Bytes Read
  • S3 Bytes Written
  • Total Load

Site24x7's EMR Monitoring Interface

Summary

Receive an overview of all your important EMR metrics including HDFS, YARN, node, and memory metrics as time series charts.

Monitored Resources

If you're monitoring your EC2 instances or S3 buckets with Site24x7, the statuses of these services will be listed in the Monitored Resources tab. You can click on any of the services to view their detailed metrics. You can also set thresholds and be notified when any of these services fail by clicking the pencil icon under Action.

Configurations

This tab displays additional configuration classifications for each instance group in a cluster. If the configurations for an instance group are modified, the new configurations will be reflected here.

Steps

The actions that are to be executed by the cluster are listed as steps.

Bootstrap Actions

Bootstrap actions can be used to install additional software or customize the configuration of cluster instances. The custom bootstrap actions are listed under this tab.

Security Configuration

Security configurations involve creating data encryption, Kerberos authentication, and Amazon S3 authorization for EMR File System. Such permissions defined for the user role or account are displayed in JSON format as shown below.

Cluster Summary

The inventory details of the EMR Cluster is displayed. Here, you will see the cluster status, the applications associated with it, the EC2 instance deployed, Subnet ID and similar details.

Additional Security Group for MasterThe extra security group added by the user for the master node.

Attribute Description
Release Label Amazon EMR release version.
Availability Zone Region where EMR is hosted.
Instance Group Type The instance group with which EC2 instances are associated with.
Auto-termination State of auto-termination: true or false.
Applications Open-source applications Amazon EMR installed while creating the cluster.
Master Public DNS Public DNS name of the master node.
Cluster Status State of the cluster: active or terminated.
State Change Message The status of the EMR cluster after a change in state.
Log URI The path of the logs stored in Amazon S3.
Creation Time Denotes the time when the EMR service was created.
Elapsed Time Total run time of the cluster.
Cluster Ready Time Denotes the time when the cluster was created.
Visible to all Users Lists the users who can view EMR.
Key Name The key provided by the user to access the EC2 instance.
Subnet ID The subnet ID in the VPC where the NAT gateway is present.
Security Group for Master The name of the managed security group when a cluster is created.
Security Group for Core and Task The name of the security group for core and task.
EC2 Instance Profile The name of the EC2 instance profile.
EMR Role The IAM policy attached to the EMR.
Requested Subnet ID Extra subnets attached by the user.
Autoscaling Role The IAM role associated with the autoscaling instance..
Scaledown Behavior Mentions one of the two behaviors: Terminate at the instance-hour boundary or terminate at task completion.
EBS Rootvolume Size Displays the capacity of the EBS.
Additional Security Group for Core and Task The extra security group added by the user for the core and task nodes.
Requested Availability Zone The extra regions added by the user.
Security Configuration User role or account permissions of EMR.
Realm The Kerberos realm name.
Custom AMI ID Displays the custom Amazon Linux AMI created by the user.
Running AMI Version The current version of the AMI release.

Was this document helpful?

Shortlink has been copied!