Help Docs

AWS Batch Monitoring Integration

AWS Batch is a fully managed batch processing service that helps you to build and execute batch computing workloads on the AWS cloud. Batch processing refers to a cost-effective method for processing multiple software programs called jobs, quickly and efficiently.

Site24x7's integration with AWS Batch enables you to monitor and analyze your batch processing that includes tasks, such as submitted jobs, failed jobs, pending jobs, and succeeded jobs.

aws batch monitoring

Table of contents

Use case

Consider that you have a AWS Batch monitor integrated with Site24x7, which has batch jobs in pending status, or in running status, and is using your AWS resources for a long time. In this case, if your account is integrated with Site24x7, then you can select multiple jobs at once, and can terminate or cancel them using IT automation. Similarly, you can also receive alerts when threshold breaches occur for your integrated monitor.

Benefits of the integration between Site24x7 and AWS Batch

By integrating Site24x7 with AWS Batch, you can:

  • Set thresholds for metrics and receive alerts for threshold breaches so that you can identify and troubleshoot the AWS Batch monitor.
  • Schedule IT automation to cancel or terminate your job at any time.
  • Obtain a detailed overview of the job definition.
  • View CloudWatch logs to find specific error codes or patterns for failed jobs.

Setup and configuration

  • If you haven't done it already, enable access to your AWS resource by creating a cross-account IAM role between your account and Site24x7's AWS account. Learn more.
  • On the Integrate AWS Account page, ensure that AWS Batch is selected in the Services to be discovered field.

Permissions

Ensure that Site24x7 receives the following permissions to monitor the batch jobs of your AWS resources:

  • "batch:DescribeJobDefinitions"
  • "batch:DescribeJobDefinitions"
  • "batch:DescribeJobQueues"
  • "batch:DescribeJobs"
  • "batch:ListJobs"
  • "batch:TerminateJob"
  • "batch:CancelJob"
  • "describeComputeEnvironments"
  • "describeJobQueues"
  • "listTagsForResource"  

Polling frequency

Site24x7 queries AWS service-level APIs as per the set polling frequency (one minute to a day) to collect metrics from AWS Batch.

Supported metrics for compute environment

Metrics name Description Statistics Unit
 Total Submitted Jobs  The total number of submitted jobs in the queues attached to the compute environment.   Average  Count
 Total Pending Jobs  The total number of pending jobs in the queues attached to the compute environment.  Average  Count
 Total Runnable Jobs  The total number of runnable jobs in the queues attached to the compute environment.   Average  Count
 Total Starting Jobs   The total number of starting jobs in the queues attached to the compute environment.  Average  Count
 Total Running Jobs  The total number of running jobs in the queues attached to the compute environment.   Average  Count
 Total Succeeded Jobs  The total number of succeeded jobs in the queues attached to the compute environment.   Average  Count
 Total Failed Jobs  The total number of failed jobs in the queues attached to the compute environment.  Average  Count
 Total Queue Count  The total number of queues attached to the compute environment.  Average  Count

Top 

Supported metrics for Job Queue 

A job queue stores your submitted jobs until the AWS Batch Scheduler runs the job on a resource in your compute environment.

Metrics name Description Statistics Unit
 Submitted Jobs  The number of submitted jobs in the queue.  Average  Count
 Pending Jobs  The number of pending jobs in  the queue.  Average  Count
 Runnable Jobs  The number of runnable jobs in the queue.  Average  Count
 Starting Jobs  The number of starting jobs in the queue.  Average  Count
 Running Jobs  The total number of running jobs in  the queue.  Average  Count
 Succeeded Jobs  The total number of succeeded jobs in the queue.  Average  Count
 Failed Jobs  The total number of failed jobs in the queue.  Average  Count
 Total Compute Environment Attached   The total number of compute environment jobs in the queue.  Average  Count

Licensing

Every AWS Batch monitor is considered a basic monitor.

IT Automation

You can add automations to perform AWS Batch actions. Go to Admin > IT Automation Templates (+) > Add Automation Templates. Once automations are added, you can schedule them to be executed one after the other.

Viewing AWS Batch

To view batch jobs of your AWS resources, log in to your Site24x7 account and navigate to Cloud > AWS > AWS Batch.

Site24x7's integration with AWS Batch also includes the AWS Batch Queue monitor. AWS Batch can have multiple queues attached. The AWS Batch Queue monitor provides the job details of each queue.

AWS Batch data

You can view the AWS Batch monitor data in the following tabs:

Summary

The Summary tab provides an overview of the AWS Batch metrics in the form of charts. These enable you to view details such as Total Submitted Jobs, Total Pending Jobs, and Total Running Jobs.

Batch Job Details

The Batch Job Details tab displays the job details related to the queues. You can filter and view the jobs based on the job status.

Monitored Resource

The Monitored Resource tab shows all the resources associated with the AWS Batch that are also monitored by Site24x7. You can also view the resource status, resource type, resource ID, and the configuration details.

Configuration

The Configuration tab provides the configuration details like Region, Job Name, Queue Status, and other details of the monitored resource.

Outages

The Outages tab displays your resource status history such as Down, Trouble, Critical, or Under Maintenance. You can also view the start time and end time of an outage, duration, and comments (if any) in the Outages tab.

Top

Was this document helpful?

Shortlink has been copied!