Threshold and Availability for a Server Monitor
Once a server monitor is successfully added to your Site24x7 account, you can add a threshold and availability profile to help the alarms engine decide if a specific resource has to be declared critical or down. Configure downtime rules to reduce false alerts.
While setting up a threshold profile, you can also map automation(s) to desired attribute(s). Once the threshold is breached, the corrective automation will be executed and the issue can be fixed without manual intervention. You can map 'n' number of automations per server monitor and upto five corrective automations per attribute.
- Add a threshold profile
- List of metrics supported
- Edit a threshold and availability profile
- Delete a threshold and availability profile
Add a Threshold and Availability Profile
- Log in to Site24x7.
- Go to Admin > Configuration Profiles > Threshold and Availability > Add Threshold Profile. You can also navigate via Server > Server Monitor > Servers > click on the server monitor > hover on the hamburger icon beside the display name > Edit > Configuration Profiles > pencil icon beside Threshold and Availability.
- Specify the following details:
- Monitor Type: Select Server Monitor from the drop-down list.
- Display Name: Provide a label for identification purposes.
- Threshold Type: You can choose between Static and AI-based thresholds. Refer the below section for the entire list of metrics for which thresholds can be set.
- Static Thresholds: From the drop-down menu, choose the desired metrics for which thresholds need to be configured. Enter a value specific to the unit, and set the threshold conditions (<, <=, =, >, or >=) and the monitor state (to be notified as) for each metric. You'll receive alerts when these threshold conditions are violated.
- AI-based Thresholds: The AI-based threshold will track the abnormal spikes using anomaly detection and will offer a dynamic threshold which will be updated accordingly. If you're choosing AI-based threshold, choose associated <a/help/reports/anomaly-report.html?src=cross-links&pg=help#domain-scoring">anomaly severity and the status accordingly.
- Advanced Threshold Settings (Strategy):
Poll count serves as the default strategy to validate the threshold breach. You can validate threshold breach by applying multiple conditions (>, <, =, >=, <=)on your specified threshold strategy. The monitor’s status changes to Trouble or Critical when the condition applied to any of the below threshold strategies hold true:- Threshold condition validated during the poll count (number of polls): Monitor’s status changes to Trouble or Critical when the condition applied to the threshold value is continuously validated for the specified “Poll count”.
- Average value during poll count (number of polls): Monitor’s status changes to Trouble or Critical, when the average of the attribute values, for the number of polls configured, continuously justifies the condition applied on the threshold value.
- Condition validated during time duration (in minutes): When the specified condition applied on the threshold value is continuously validated, for all the polls, during the time duration configured, monitor’s status changes to Trouble or Critical.
- Average value during time duration (in minutes): Monitor’s status changes to Trouble or Critical, when the average of the attribute values, for the time duration configured, continuously justifies the condition applied on the threshold value.
Multiple poll check strategy will not be applied by default. During the conditions where no strategy could be applied, the threshold breach will be validated for a single poll alone.
To make sure the condition applied on the strategy “Strategy-3: Time duration or Strategy-4: Average value during time duration” for threshold breach detection works as intended, you must ensure that you specify a time duration which is at least twice the applied check frequency for that monitor. - Click Save. The threshold and availability profile created for the server monitor will be automatically listed in the Threshold and Availability screen along with the others already created.
List of Metrics Supported
- General Thresholds:
- Notify when process/service is Down: Enable the toggle button to get notified when a process/service is Down. You can choose to be notified with a Trouble, Down, or Critical alert. Automations can also be mapped to this attribute.
- Alert when a resource check fails: Enable the toggle button to trigger an alert (Trouble/Down/Critical) when a resource check fails.
- Notify when disk partition is down: Toggle Yes if you want to receive an alert (Down) when a partition is removed from the server.
- Network Adapter is Down: Enable the toggle button to trigger a trouble alert when a network adapter is down.
- CPU Utilization threshold: Get notified with status Trouble or Critical when the CPU utilization crosses the configured threshold.
- CPU Utility %: Get notified with Trouble or Critical status when the CPU utility percentage crosses the configured threshold.
- CPU Utilization Threshold by Cores: Get notified with Trouble or Critical status when the CPU utilization threshold by core crosses the configured threshold.
- Memory Utilization threshold: Get notified with status Trouble or Critical when the Memory utilization crosses the configured threshold.
- Disk Utilization threshold: Get notified with status Trouble or Critical when the Disk utilization crosses the configured threshold.
- Partition disk utilization threshold: Get notified with status Trouble or Critical when the disk utilization of partition crosses the configured threshold. This can be configured in Bytes, KB, MB, GB, TB, and percentage.
- Process CPU Utilization threshold: Get notified with status Trouble or Critical when the overall process CPU utilization crosses the configured threshold.
- Process Memory Utilization threshold: Get notified with status Trouble or Critical when the overall process memory utilization crosses the configured threshold.
- Process Thread Count threshold: Get notified with status Trouble or Critical when the overall process thread threshold count crosses the configured count.
- Process Handle Count threshold: Get notified with status Trouble or Critical when the overall process handle threshold count crosses the configured count.
- Network Bandwidth Exceeds: Get notified when the bandwidth utilization exceeds the configured value.
- Network Error Packet threshold: Get notified with status Trouble or Critical when the network error packet crosses the configured threshold.
- Swap Memory Utilization Threshold: Get notified when the swap memory utilized exceeds the configured value.
- Disk Reads: Get notified with status Trouble or Critical when the bytes of data read from the disk exceeds the configured value.
- Disk Writes: Get notified with status Trouble or Critical when the bytes of data written to the disk exceeds the configured value.
- Disk I/O: Get notified with status Trouble or Critical when the total read/write or input/output operations exceed the configured value.
- Disk Idle Percentage: Get notified with status Trouble or Critical when the percentage of disk spent in idle state exceeds the configured value.
- Disk Busy Percentage: Get notified with status Trouble or Critical when the percentage of disk spent in busy state exceeds the configured value.
- Server Uptime: Get notified with status Trouble or Critical when the uptime of the server exceeds the configured value. This can be configured in ms, sec, min, hour, and day.
- IP Address: This is a string attribute. Get notified with status Trouble or Critical when the given IP addresses match/does not match the configured value. You can give multiple addresses separated by comma for the conditions contains and doesn't contain. For the condition On Change, no value can be configured and additionally you can choose to select Send Alert, wherein only an alert will be sent without changing the monitor status.
- RAM Size: This is a string attribute. Get notified with status Trouble or Critical when the RAM size matches/does not match the configured value.
- System Idle Percentage: Get notified with status Trouble or Critical when the percentage of server spent in idle state exceeds the configured value.
- Running Process Count Threshold: Get notified with status Trouble or Critical when the total number of processes running on the server exceed the configured threshold.
- Windows Specific Thresholds:(supported from version 15.3.1)
- Total services count: Get notified with status Trouble or Critical when the total number of services running on the server cross the configured threshold.
- Processor queue length exceeds: Get notified with status Trouble or Critical when the number of threads waiting for CPU resources exceed the configured threshold.
- Linux Specific Thresholds:(supported from version 14.7.0)
- Sys Load (1 min avg) Threshold: Get notified with status Trouble or Critical when the 1 minute average of the system load exceeds the configured value.
- Sys Load (5 min avg) Threshold: Get notified with status Trouble or Critical when the 5 minutes average of the system load exceeds the configured value.
- Sys Load (15 min avg) Threshold: Get notified with status Trouble or Critical when the 15 minutes average of the system load exceeds the configured value.
- Total Process Count Threshold: Get notified with status Trouble or Critical when the total number of processes exceed the configured value.
- Blocked Process Count Threshold: Get notified with status Trouble or Critical when the number of processes waiting for a resource exceeds the configured threshold.
- CPU User Space Threshold: Get notified with status Trouble or Critical when the CPU percentage spent on user processes exceeds the configured value.
- CPU Hardware Interrupt Threshold: Get notified with status Trouble or Critical when the CPU percentage servicing hardware interrupts exceed the configured value.
- CPU Idle Threshold: Get notified with status Trouble or Critical when the CPU percentage spent in idle state exceeds the configured value.
- CPU Software Interrupt Threshold: Get notified with status Trouble or Critical when the CPU percentage servicing software interrupts exceed the configured value.
- CPU Nice Threshold: Get notified with status Trouble or Critical when the CPU percentage processing low priority processes exceed the configured value.
- CPU Wait Threshold: Get notified with status Trouble or Critical when the CPU percentage waiting on I/O operations exceed the configured value.
- CPU Steal Threshold: Get notified with status Trouble or Critical when the time stolen by the Hypervisor host to use it for the other virtual machines exceed the configured value.
- CPU System Threshold: Get notified with status Trouble or Critical when the CPU percentage spent on system processes exceed the configured value.
Edit a Threshold and Availability Profile
- Click the profile which you want to edit.
- Edit the parameters which needs to be changed in the Add Threshold Profile window.
- Click Save.
Delete a Threshold and Availability Profile
- Click the profile in the Threshold and Availability screen which needs to be deleted.
- This will navigate to Edit Threshold Profile window.
- Click Delete.
Related Articles
- Threshold profile for other monitors
- Add a server monitor: Windows | Linux | FreeBSD | OS X
- Performance metrics: Windows | Linux | FreeBSD | OS X
- Server monitoring architecture
- Get started with IT automation
- Server monitoring settings
- Service and process monitoring
- 50+ plugin integrations
- Microsoft applications supported