Performance Metrics for Hadoop Monitoring

View top N DataNodes based on failed volumes and cached blocks, the disk utilization, and load average of NameNodes; live, stale, and dead DataNodes and more - from a single console. Set thresholds and be notified when a metric exceeds the configured value.

Once the Linux monitoring agent is successfully installed, the entire Hadoop cluster will be auto-discovered and added for monitoring under Server > Hadoop > cluster name. If you are monitoring multiple clusters, you can find them listed under Hadoop > Hadoop Clusters.

Health dashboard
Metrics for NameNodes
Metrics for DataNodes
Metrics for YARN
Add a threshold and availability profile

Health Dashboard

The Health Dashboard provides the current status of the entire Hadoop cluster. The other metrics shown in the dashboard includes Top N DataNodes based on failed volumes and cached blocks, volume failures, heap memory statistics, file statistics, among others. The dashboard will be auto-refreshed every minute; to refresh it immediately click on the refresh icon beside the page title. Share This report as a PDF file or create a permalink to share this dashboard publicly.

Performance Metrics for DataNodes, NameNodes, and YARN

You can view performance metrics for every DataNode, NameNode, and YARN added for monitoring. Go to Server > Hadoop > click on the Hadoop cluster > NameNodes/DataNodes/YARN > click on the monitor.

Ensure the Site24x7 Linux Monitoring agent is installed in every DataNode, NameNode, and YARN to view the following performance metrics. If you haven't installed the agent extension yet, go to Server > Hadoop > click on the cluster > NameNodes/DataNodes/YARN > click on the monitor > Server Monitoring Extension > Get Started Now > select the Monitors > click Submit.

Metrics for NameNodes:

Once the Linux monitoring agent is installed in each NameNode, you can see the following metrics for every NameNode monitor under the Summary tab (Server > Hadoop > click on the Hadoop cluster > NameNodes):

Parameter	Description
DFS Capacity Utilization	The used and free space in the DFS cluster
File Statistics	Total number of files tracked by the NameNode
Heap Memory Statistics	Current heap memory used and committed in GB
Non Heap Memory Statistics	Current non heap memory used and committed in GB
Total Load	The number of concurrent file access connections across DataNodes
DFS Replication	The number of under-replicated blocks, blocks pending for replication, and scheduled for replication
Log Statistics	The number of fatal, error, and warning logs
Thread Statistics	The number new, running, blocked, waiting, and terminated threads
Block Statistics	The total number of allocated blocks, missing blocks, and blocks with corrupted replicas
Nodes - Lists all the nodes associated with this cluster.
CPU (%)	The CPU utilization of the NameNode
Memory (%)	The memory utilization of the NameNode
Disk Used (%)	The disk utilization of the NameNode
Status	Availability of the NameNode - Up or Down
Install Agent	Install the Linux monitoring agent extension in the nodes that do not have the extension yet.

Metrics for DataNodes:

Once the Linux monitoring agent is installed in each DataNode, you can see the following metrics for every DataNode monitor under the Summary tab (Server > Hadoop > click on the Hadoop cluster > DataNodes):

Parameter	Description
DFS Used	DFS space used by the DataNode
Cache Used	The number of blocks cached
Heap Memory Statistics	Current heap memory used and committed in GB
Non Heap Memory Statistics	Current non heap memory used and committed in GB
Failed Cache Blocks	The number of blocks that failed to cache
Failed Uncache Blocks	The number of blocks that failed to remove from cache
Log Statistics	The number of fatal, error, and warning logs
Thread Statistics	The number of new, running, blocked, waiting, and terminated threads
Failed Volume	Number of failed volumes. Although a failed volume will not halt the Hadoop cluster performance, it is important to know why such failures occur.

Metrics for YARN:

Once the Linux monitoring agent is installed in each YARN, you can see the following metrics for every YARN monitor under the Summary tab (Server > Hadoop > click on the Hadoop cluster > YARN):

Parameter	Description
Apps Submitted/Completed	Number of completed applications
Apps Running/Pending	Number of running and pending applications
Apps Failed/Killed	Number of failed and killed applications
Node Details	Number of unhealthy, lost, active, decommissioned, and rebooted node managers
Memory Stats	Total amount of reserved, allocated, and available memory
Virtual Cores	Number of reserved and allocated virtual cores
Container Stats	Number of allocated and reserved containers

If you have ZooKeeper running in your Hadoop cluster, install the Linux agent in your ZooKeeper application and get them monitored as well. Learn more.

Add a Threshold and Availability Profile

Once the NameNodes, DataNodes, and YARN monitors are successfully added to your Site24x7 account, you can define threshold values for each of the above metrics and get notified when there is a breach. Follow the steps below to add/edit a threshold profile:

Log in to Site24x7 and go to Server > Hadoop.
Click on the Hadoop cluster > NameNodes/DataNodes/YARN > click on the monitor.
Hover on the hamburger icon beside the display name and click Edit.
In the Edit Hadoop Monitor page, under Configuration Profiles, click on the pencil icon to edit the default threshold profile or the (+) icon to add a new profile beside the field Threshold and Availability.
Define the values for the required metrics and Save your changes.

Add a Hadoop Monitor
Add a Linux monitor | Performance metrics for Linux monitoring
Add a Docker | Add a SMART Disk | Add a plugin
Server monitoring architecture
Other OS platforms supported: Windows | FreeBSD | OS X

Performance Metrics for Hadoop Monitoring

Health Dashboard

Performance Metrics for DataNodes, NameNodes, and YARN

Add a Threshold and Availability Profile

Related Articles