Help Docs

AWS Glue monitoring

AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources.

Overview

Site24x7 provides real-time monitoring capabilities, enabling you to track the performance and health of your Amazon Glue monitors. The integration provides you with the following two monitors:

Glue Job: Monitors your Extract, Transform, and Load (ETL) jobs and performance.

Glue Crawler: Provides critical insights about your crawler runs and crawler data source.

Use case

Consider that you have an AWS Glue service in which you scheduled or triggered glue jobs manually or in workflows. The job run might fail, time out, encounter an error, or end up in a waiting status causing unexpected issues.

By integrating your AWS Glue with Site24x7, you'll receive notifications for waiting, failed, timed-out, or errored job runs, enabling you to address issues and prevent recurring failures.

Benefits of the Site24x7-AWS Glue integration

Integrate your AWS Glue with Site24x7 to:

  • Obtain critical insights about your ETL jobs and performance.
  • Track the process of table addition, deletion and updation.
  • Monitor your job runs, crawler runs, and crawler data source.
  • Set thresholds for metrics and receive alerts (Down, Critical, Trouble) for threshold breaches.
  • Schedule IT automation to start a job run or crawler run at any time.

Setup and configuration

Permissions

Ensure that Site24x7 receives the following permissions to monitor Glue Job and Glue Crawler:

  • glue:ListJobs
  • glue:ListCrawlers
  • glue:GetTriggers
  • glue:GetJobRuns
  • glue:ListCrawls
  • glue:GetJobRun
  • glue:GetCrawler
  • glue:GetJob
  • glue:GetTags
  • glue:GetClassifier
  • glue:GetConnection
  • glue:GetCrawlerMetrics

Polling frequency

Site24x7 queries AWS service-level APIs per the set polling frequency (from once a minute to once a day) to collect metrics from the Glue Job and Glue Crawler monitors.

Supported metrics

The supported metrics for each AWS Glue monitor are given below.

Glue Job

The supported metrics for the Glue Job monitor are provided below:

Metric name Description Statistics Unit
Total Runs The total number of job runs. Sum Count
Completed Runs The number of completed job runs. Sum Count
Failed Runs The number of failed job runs. Sum Count
Canceled Runs The number of canceled job runs. Sum Count
Error Runs The number of error job runs. Sum Count
Timeout Runs The number of timeout job runs. Sum Count
Waiting Runs The number of job runs in waiting status. Sum Count
Completed Runs Percentage The percentage of completed job runs. Average Percentage
Failed Runs Percentage The percentage of failed job runs. Average Percentage
Canceled Runs Percentage The percentage of canceled job runs. Average Percentage
Error Runs Percentage The percentage of error job runs. Average Percentage
Timeout Runs Percentage The percentage of timeout job runs. Average Percentage
Waiting Runs Percentage The percentage of job runs in waiting status. Average Percentage

The metrics for Spark, Spark Streaming, and Python Shell job types shown below will be monitored only if the Advanced properties > Job metrics option in the Job details tab, is enabled in the AWS Glue console.

Metric name Description Statistics Unit
CPU Load Usage The fraction of CPU system load used by the driver. Average Percentage
Bytes Read The number of bytes read from all data sources by all completed Spark tasks running in all executors. Sum Bytes
Records Read The number of records read from all data sources. Sum Count
Read From S3 The number of bytes read from Amazon S3 by all executors. Sum Bytes
Write into S3 The number of bytes written to Amazon S3 by all executors. Sum Bytes
Shuffle Bytes Read The number of bytes read by all executors to shuffle data between them. Sum Bytes
Shuffle Bytes Written The number of bytes written by all executors to shuffle data between them. Sum Bytes
Memory Used by Driver The number of memory bytes used by the Java Virtual Machine (JVM) heap for the driver. Average Bytes
Memory Used by Executor The number of memory bytes used by the JVM heap for all executors. Average Bytes
Disk Used The number of megabytes of disk space used across all executors. Average Megabytes
ETL Elapsed Time The ETL elapsed time in milliseconds (does not include the job bootstrap times). Sum Milliseconds
Completed Tasks The number of completed tasks in the job. Sum Count
Failed Tasks The number of failed tasks in the job. Sum Count
Killed Tasks The number of tasks killed in the job. Sum Count
Completed Stages The number of completed stages in the job. Sum Count

The metrics shown below are for the Spark Streaming job type only. These will be monitored only if the Advanced properties > Job metrics option in the Job details tab, is enabled in the AWS Glue console.

Metric name Description Statistics Unit
Records Received in Streaming Batch The number of records that are received in a micro-batch Sum Count
Batch Processing Time The batch processing time metric helps you determine if the cluster is underprovisioned or overprovisioned. Sum Milliseconds
Input Records Per Second The rate at which input records are received. Sum Count
Processing Records Per Second The rate at which records are being processed. Sum Count
All Executors Number of actively running job executors Average Count
Maximum Executors Number of maximum (actively running and pending) job executors needed to satisfy the current load. Average Count

The Glue Observability metrics below for Spark, Spark Streaming, and Python Shell job types will be monitored, only if the Advanced properties > Job observability metrics option in the Job details tab, is enabled in the AWS Glue console.

Metric name Description Statistics Unit
Job Skewness The weighted average of the job stage skewness. Average Count
Worker Utilization The percentage of the allocated workers who are actually used. Average Percentage
Driver Disk Used The percentage of disk space used by the driver. Average Percentage
Executor Disk Used The percentage of disk space used by the executor. Average Percentage

Glue Crawler

The supported metrics for the Glue Crawler monitor are provided below:

Metric name Description Statistics Unit
Tables Created The number of tables created. Average Count
Tables Updated The number of tables updated. Average Count
Tables Deleted The number of tables deleted. Average Count
Time Left Seconds The estimated time left to complete a running crawl. Sum Seconds
Last Runtime Seconds The duration of the most recent crawler run. Sum Seconds
Median Runtime Seconds The median runtime duration of all crawler runs. Sum Seconds
Total Runs The total number of crawler runs. Sum Count
Completed Runs The number of completed crawler runs. Sum Count
Failed Runs The number of failed crawler runs. Sum Count
Canceled Runs The number of canceled crawler runs. Sum Count
Completed Runs Percentage The percentage of completed crawler runs. Average Percentage
Failed Runs Percentage The percentage of failed crawler runs. Average Percentage
Canceled Runs Percentage The percentage of canceled crawler runs. Average Percentage

Threshold configuration

To configure thresholds for AWS Glue monitors:

  1. Log in to your Site24x7 account and navigate to Admin > Configuration Profiles > Threshold and Availability.
  2. Click Add Threshold Profile.
  3. Select the applicable monitor type from theMonitor Type drop-down menu and provide an appropriate name in the Display Name field. The applicable monitor types are Glue Job and Glue Crawler.
  4. The supported metrics are displayed in theThreshold Configuration section. You can set threshold values for all the metrics mentioned above.
  5. Click Save.

Licensing

  • Glue Job: Each Glue Job monitor is considered a basic monitor.
  • Glue Crawler: Each Glue Crawler monitor is considered a basic monitor.

IT Automation

You can add automations to perform Glue Job and Glue Crawler runs. Go to Admin > IT Automation Templates (+) > Add Automation Templates. Once automations are added, you can schedule them to be executed one after the other.

Viewing AWS Glue data

To view AWS Glue Job data:

  • From the Site24x7 console, navigate to Cloud > AWS Glue Job.

To view AWS Glue Crawler data:

  • From the Site24x7 console, navigate to Cloud > AWS > Glue Crawler.

AWS Glue monitor data

The monitor data for each AWS Glue monitor is given below.

Glue Job monitor data

You can view your Glue Job data on the following tabs.

Summary

The Summary tab provides a comprehensive overview of the events timeline and metrics in the form of charts. You can view the Glue Job performance details as well the last job run details in this tab.

Note

The Last Run Details section is hidden when there are no job runs.

Job Runs

The Job Runs tab provides details of all the job runs.

You can filter the job run details by Run ID, Status, Start Time, End Time, and Log Group Name. Click the applicable Run ID hyperlink to obtain a detailed overview of the job run from the Job Run Details page.

Connections

View all connections for a job in the Connections tab.You can filter the connection details with options such as Connection Name, Type, and Creation Time.

Triggers

The Triggers tab provides glue trigger details such as Trigger Name, Type, and Creation Time, which triggered the job.

Configuration

The Configuration tab summarizes information regarding your Glue Job monitor, including its name, type, creation date, and script location. The Source Details section provides data of remote sources where a job gets stored, such as with a Repository Provider, Repository Name, and Branch Name.

Outages

The Outages tab provides details on an outage's start time, end time, duration, and comments (if any).

Inventory

Obtain details like Job Name, Region, and Monitor Licensing Category from the Inventory tab. Set and view the Threshold and Availability Profile and the Notification Profile according to the user in this tab.

Log Report

This tab provides a consolidated report of the Glue Job monitor's log status, which can be downloaded as a CSV file.

Glue Crawler monitor data

You can view your Glue Crawler data on the following tabs.

Summary

The Summary tab displays a comprehensive overview of the events timeline and metrics in the form of charts, including glue crawler performance and last crawler run details.

Note

The Last Crawler Run Data section is hidden when there are no crawler runs.

Crawler Runs

The Crawler Runs tab provides details of all the crawler runs. You can filter the crawler run details by Run ID, Status, Start Time, End Time, and DPU Hours.

Click the applicable Run ID hyperlink to obtain a detailed overview of the crawler run from the Crawler Run Details page.

Classifiers

The Classifiers tab provides details of all classifiers specified for a crawler. You can filter the classifier details with options such as Classifier Name, Type, and Creation Time.

Crawler Source

The Crawler Source tab displays all sources of a crawler where it extracts data. You can view the Source Name, Connection Name, Path, and Tables Count in this tab.

Configuration

The Configuration tab summarizes information regarding your Glue Crawler monitor, including its name, status, version, and creation date.

Outages

The Outages tab provides details on an outage's start time, end time, duration, and comments (if any).

Inventory

Obtain details like Crawler Name, Region, and Monitor Licensing Category from the Inventory tab. Set and view the Threshold and Availability Profile and the Notification Profile according to the user in this tab.

Log Report

This tab provides a consolidated report of the Glue Crawler monitor's log status, which can be downloaded as a CSV file.

Was this document helpful?

Would you like to help us improve our documents? Tell us what you think we could do better.


We're sorry to hear that you're not satisfied with the document. We'd love to learn what we could do to improve the experience.


Thanks for taking the time to share your feedback. We'll use your feedback to improve our online help resources.

Shortlink has been copied!