Performance Metrics for Hadoop Monitoring
View top N DataNodes based on failed volumes and cached blocks, the disk utilization, and load average of NameNodes; live, stale, and dead DataNodes and more - from a single console. Set thresholds and be notified when a metric exceeds the configured value.
Once the Linux monitoring agent is successfully installed, the entire Hadoop cluster will be auto-discovered and added for monitoring under Server > Hadoop > cluster name. If you are monitoring multiple clusters, you can find them listed under Hadoop > Hadoop Clusters.
- Health dashboard
- Metrics for NameNodes
- Metrics for DataNodes
- Metrics for YARN
- Add a threshold and availability profile
Health Dashboard
The Health Dashboard provides the current status of the entire Hadoop cluster. The other metrics shown in the dashboard includes Top N DataNodes based on failed volumes and cached blocks, volume failures, heap memory statistics, file statistics, among others. The dashboard will be auto-refreshed every minute; to refresh it immediately click on the refresh icon beside the page title. Share This report as a PDF file or create a permalink to share this dashboard publicly.
Performance Metrics for DataNodes, NameNodes, and YARN
You can view performance metrics for every DataNode, NameNode, and YARN added for monitoring. Go to Server > Hadoop > click on the Hadoop cluster > NameNodes/DataNodes/YARN > click on the monitor.
Once the Linux monitoring agent is installed in each NameNode, you can see the following metrics for every NameNode monitor under the Summary tab (Server > Hadoop > click on the Hadoop cluster > NameNodes):
Parameter | Description |
DFS Capacity Utilization | The used and free space in the DFS cluster |
File Statistics | Total number of files tracked by the NameNode |
Heap Memory Statistics | Current heap memory used and committed in GB |
Non Heap Memory Statistics | Current non heap memory used and committed in GB |
Total Load | The number of concurrent file access connections across DataNodes |
DFS Replication | The number of under-replicated blocks, blocks pending for replication, and scheduled for replication |
Log Statistics | The number of fatal, error, and warning logs |
Thread Statistics | The number new, running, blocked, waiting, and terminated threads |
Block Statistics | The total number of allocated blocks, missing blocks, and blocks with corrupted replicas |
Nodes - Lists all the nodes associated with this cluster. | |
CPU (%) | The CPU utilization of the NameNode |
Memory (%) | The memory utilization of the NameNode |
Disk Used (%) | The disk utilization of the NameNode |
Status | Availability of the NameNode - Up or Down |
Install Agent | Install the Linux monitoring agent extension in the nodes that do not have the extension yet. |
Once the Linux monitoring agent is installed in each DataNode, you can see the following metrics for every DataNode monitor under the Summary tab (Server > Hadoop > click on the Hadoop cluster > DataNodes):
Parameter | Description |
DFS Used | DFS space used by the DataNode |
Cache Used | The number of blocks cached |
Heap Memory Statistics | Current heap memory used and committed in GB |
Non Heap Memory Statistics | Current non heap memory used and committed in GB |
Failed Cache Blocks | The number of blocks that failed to cache |
Failed Uncache Blocks | The number of blocks that failed to remove from cache |
Log Statistics | The number of fatal, error, and warning logs |
Thread Statistics | The number of new, running, blocked, waiting, and terminated threads |
Failed Volume | Number of failed volumes. Although a failed volume will not halt the Hadoop cluster performance, it is important to know why such failures occur. |
Once the Linux monitoring agent is installed in each YARN, you can see the following metrics for every YARN monitor under the Summary tab (Server > Hadoop > click on the Hadoop cluster > YARN):
Parameter | Description |
Apps Submitted/Completed | Number of completed applications |
Apps Running/Pending | Number of running and pending applications |
Apps Failed/Killed | Number of failed and killed applications |
Node Details | Number of unhealthy, lost, active, decommissioned, and rebooted node managers |
Memory Stats | Total amount of reserved, allocated, and available memory |
Virtual Cores | Number of reserved and allocated virtual cores |
Container Stats | Number of allocated and reserved containers |
Add a Threshold and Availability Profile
Once the NameNodes, DataNodes, and YARN monitors are successfully added to your Site24x7 account, you can define threshold values for each of the above metrics and get notified when there is a breach. Follow the steps below to add/edit a threshold profile:
- Log in to Site24x7 and go to Server > Hadoop.
- Click on the Hadoop cluster > NameNodes/DataNodes/YARN > click on the monitor.
- Hover on the hamburger icon beside the display name and click Edit.
- In the Edit Hadoop Monitor page, under Configuration Profiles, click on the pencil icon to edit the default threshold profile or the (+) icon to add a new profile beside the field Threshold and Availability.
- Define the values for the required metrics and Save your changes.
Related Articles
- Add a Hadoop Monitor
- Add a Linux monitor | Performance metrics for Linux monitoring
- Add a Docker | Add a SMART Disk | Add a plugin
- Server monitoring architecture
- Other OS platforms supported: Windows | FreeBSD | OS X