Help Docs

Performance Metrics for Hadoop Monitoring

View top N DataNodes based on failed volumes and cached blocks, the disk utilization, and load average of NameNodes; live, stale, and dead DataNodes and more - from a single console. Set thresholds and be notified when a metric exceeds the configured value. 

Once the Linux monitoring agent is successfully installed, the entire Hadoop cluster will be auto-discovered and added for monitoring under Server > Hadoop > cluster name. If you are monitoring multiple clusters, you can find them listed under Hadoop > Hadoop Clusters.

Health Dashboard

The Health Dashboard provides the current status of the entire Hadoop cluster. The other metrics shown in the dashboard includes Top N DataNodes based on failed volumes and cached blocks, volume failures, heap memory statistics, file statistics, among others. The dashboard will be auto-refreshed every minute; to refresh it immediately click on the refresh icon beside the page title. Share This report as a PDF file or create a permalink to share this dashboard publicly. 

Performance Metrics for DataNodes, NameNodes, and YARN

You can view performance metrics for every DataNode, NameNode, and YARN added for monitoring. Go to Server > Hadoop > click on the Hadoop cluster > NameNodes/DataNodes/YARN > click on the monitor.

Ensure the Site24x7 Linux Monitoring agent is installed in every DataNode, NameNode, and YARN to view the following performance metrics. If you haven't installed the agent extension yet, go to Server > Hadoop > click on the cluster > NameNodes/DataNodes/YARN > click on the monitor > Server Monitoring Extension > Get Started Now > select the Monitors > click Submit.

Metrics for NameNodes:

Once the Linux monitoring agent is installed in each NameNode, you can see the following metrics for every NameNode monitor under the Summary tab (Server > Hadoop > click on the Hadoop cluster > NameNodes):

Parameter Description
DFS Capacity Utilization The used and free space in the DFS cluster 
File Statistics Total number of files tracked by the NameNode 
Heap Memory Statistics Current heap memory used and committed in GB 
Non Heap Memory Statistics Current non heap memory used and committed in GB
Total Load The number of concurrent file access connections across DataNodes
DFS Replication The number of under-replicated blocks, blocks pending for replication, and scheduled for replication
Log Statistics The number of fatal, error, and warning logs 
Thread Statistics  The number new, running, blocked, waiting, and terminated threads
Block Statistics  The total number of allocated blocks, missing blocks, and blocks with corrupted replicas 
Nodes - Lists all the nodes associated with this cluster.
CPU (%) The CPU utilization of the NameNode 
Memory (%)  The memory utilization of the NameNode 
Disk Used (%)  The disk utilization of the NameNode 
Status  Availability of the NameNode - Up or Down
Install Agent Install the Linux monitoring agent extension in the nodes that do not have the extension yet. 

Metrics for DataNodes:

Once the Linux monitoring agent is installed in each DataNode, you can see the following metrics for every DataNode monitor under the Summary tab (Server > Hadoop > click on the Hadoop cluster > DataNodes):

Parameter Description
DFS Used  DFS space used by the DataNode 
Cache Used  The number of blocks cached 
Heap Memory Statistics Current heap memory used and committed in GB 
Non Heap Memory Statistics  Current non heap memory used and committed in GB 
Failed Cache Blocks  The number of blocks that failed to cache 
Failed Uncache Blocks The number of blocks that failed to remove from cache
Log Statistics  The number of fatal, error, and warning logs  
Thread Statistics The number of new, running, blocked, waiting, and terminated threads 
Failed Volume Number of failed volumes. Although a failed volume will not halt the Hadoop cluster performance, it is important to know why such failures occur.  

Metrics for YARN:

Once the Linux monitoring agent is installed in each YARN, you can see the following metrics for every YARN monitor under the Summary tab (Server > Hadoop > click on the Hadoop cluster > YARN):

Parameter Description
Apps Submitted/Completed Number of completed applications 
Apps Running/Pending Number of running and pending applications 
Apps Failed/Killed Number of failed and killed applications 
Node Details Number of unhealthy, lost, active, decommissioned, and rebooted node managers 
Memory Stats Total amount of reserved, allocated, and available memory
Virtual Cores Number of reserved and allocated virtual cores
Container Stats Number of allocated and reserved containers
If you have ZooKeeper running in your Hadoop cluster, install the Linux agent in your ZooKeeper application and get them monitored as well. Learn more.

Add a Threshold and Availability Profile

Once the NameNodes, DataNodes, and YARN monitors are successfully added to your Site24x7 account, you can define threshold values for each of the above metrics and get notified when there is a breach. Follow the steps below to add/edit a threshold profile:

  1. Log in to Site24x7 and go to Server > Hadoop.
  2. Click on the Hadoop cluster > NameNodes/DataNodes/YARN > click on the monitor.
  3. Hover on the hamburger icon beside the display name and click Edit.
  4. In the Edit Hadoop Monitor page, under Configuration Profiles, click on the pencil icon to edit the default threshold profile or the (+) icon to add a new profile beside the field Threshold and Availability.  
  5. Define the values for the required metrics and Save your changes.

Related Articles

 

Was this document helpful?

Shortlink has been copied!