Help Google Cloud Google Cloud Dataflow

Google Cloud Dataflow monitoring integration

Site24x7 lets you utilize Google Cloud Dataflow to the fullest by monitoring key metrics that will help you fine-tune your fully managed and serverless data processing service. With Site24x7's Google Cloud Dataflow monitor, building and executing data processing pipelines for your large-scale data processing and analytics tasks will be easier.

Table of contents

Setup and configuration

  • Adding Google Cloud Dataflow while configuring a new Google Cloud monitor

    If you have not configured a Google Cloud monitor yet, add one by following the steps below:

    1. Log in to your Site24x7 account.
    2. Go to Cloud > GCP > Add GCP Monitor or Admin > Cloud Monitoring > Google Cloud Platform(GCP).
    3. Provide a unique display name for identification purposes.
    4. Upload the JSON file that contains the private key of the service account to authenticate Site24x7 for performing resource discovery.
    5. Select Cloud Dataflow from the Select the Resources for Monitoring list.
    6. Select existing Notification Profiles, User Alerts Groups, Tags, and IT Automation Templates or add new ones. You can also integrate Site24x7's alarms with your preferred third-party service.
    7. Click Start GCP Monitoring.
  • Adding Google Cloud Dataflow to an existing Google Cloud monitor

    If you already have a Google Cloud monitor configured for the service account, you can add Google Cloud Dataflow by following the steps below:

    1. Log in to your Site24x7 account.
    2. Go to Cloud > GCP and select your GCP monitor.
    3. Click the hamburger Hamburger icon icon next to Service View and select Edit, which brings you to the Edit GCP Monitor page.
    4. On the Edit GCP Monitor page, select Cloud Dataflow from the Select the Resources for Monitoring list and click Save.
    5. After successful configuration, go to Cloud > GCP > Cloud Dataflow. Now you can view the discovered Google Cloud Dataflow resources.
It will take approximately five minutes to discover new GCP resources.

Polling frequency

Site24x7's Google Cloud Dataflow monitor collects minute-wise metric data, and the statuses of your Google Cloud Dataflow are reported every five minutes.

Supported metrics

Metric nameDescriptionStatisticUnit
Active Size The average size of active data in the job at the time of data collection Average Bytes
Throughput The average size of data consumed by the job at the time of data collection Average Bytes
Current Processing Key-range Availability The share of streaming processing keys that are assigned to work and are available to perform work Total Percentage
Target Worker Instances The total desired number of worker instances Total Count
Billable Shuffle Data Processed The total shuffle data that qualifies to be billed for the Dataflow Total Bytes
Current Number Of VCPUs In Use The total number of vCPUs being utilized for the Dataflow Total Count
Current Shuffle Slots In Use The total number of shuffle slots used by the Dataflow Total Count
Data Watermark Lag The time since Dataflow processed its first data Average Seconds
Elapsed Time The time period consumed by the active pipeline Average Seconds
Element Count The total number of elements added to the PCollection Total Count
Estimated Byte Count The estimated number of bytes added to the PCollection Average Count
Failed Indicates if the job has failed or not Status Boolean
Status Indicates the current status of the pipeline. Possible values include: Running, Done, Cancelled, and Failed. Status Text
System Lag The maximum amount of time data waits to be processed Average Seconds
Total PD Usage Time The total Gigabyte seconds of all persistent disk used by all workers related to the active pipeline Total Seconds
Total Memory Usage Time The total Gigabyte seconds of data allocated to this Dataflow Total Seconds
Total Shuffle Data Processed The total size of shuffle data processed by the Dataflow Total Bytes
Total Streaming Data Processed The total size of streaming data processed by the Dataflow Total Bytes
Total vCPU Time The total vCPU time taken by the Dataflow Total Seconds
User Counter The user defined counter metric Status Boolean

Threshold configuration

  • Global configuration
    1. In the Site24x7 web client, go to the Admin section on the left navigation pane.
    2. Select Configuration Profiles from the left pane and select Threshold and Availability from the drop-down menu.
    3. Click Add Threshold Profile in the top-right corner.
    4. For Monitor Type, select Cloud Dataflow.
    5. Now you can set the threshold values for the metrics listed above.
  • Monitor-level configuration
    1. In the Site24x7 web client, go to Cloud > GCP > Cloud Dataflow.
    2. Select a resource you would like to set a threshold for, then click the hamburger Hamburger icon icon.
    3. Select Edit, which directs you to the Edit Cloud Dataflow Monitor page.
    4. You can set the threshold values for the metrics with the Threshold and Availability option.
    5. You can also configure IT Automation at the attribute level.

IT Automation

Site24x7 offers a set of exclusive IT Automation tools that automatically resolve performance degradation issues. These tools react to events proactively rather than waiting for manual intervention. The IT Automation tools automate repetitive tasks and automatically remediate threshold breaches. The alarm engine continually evaluates system events for which thresholds are set and executes the mapped automation when there is a breach.

How to configure IT Automation for a monitor

Configuration Rules

Editing multiple monitors to associate different monitor groups or add a different tag can be a tedious process. With Site24x7's Configuration Rules, you can automate the configuration settings of your monitoring resources. Also, Site24x7 allows you to create custom rules to track configuration changes continuously and achieve the ideal configuration settings.

How to add Configuration Rules

Summary

The Summary tab will give you the performance data organized by time for the metrics listed above. To view the summary:

  1. Go to Cloud > GCP > Cloud Dataflow.
  2. Select a resource.
  3. Click the Summary tab.

Configuration Details

The Configuration Details tab provides details on the configurations of application instances. To get the configuration details:

  1. Go to Cloud > GCP > Cloud Dataflow.
  2. Select a resource.
  3. Click the Configuration Details tab.

Reports

Gain in-depth data about the various parameters of your monitored resources and accentuate your service performance using our insightful reports.

To view reports for a Google Cloud Dataflow resource:

  1. Go to the Reports section on the left navigation pane.
  2. Select Cloud Dataflow from the menu on the left.
  3. You can find the Availability Summary Report, Performance Report, and Inventory Report for one selected monitor. Or you can get the Summary Report, Availability Summary Report, Health Trend Report, and Performance Report for all the Google Cloud Dataflow monitors.

You can also get reports from the Summary tab of the Google Cloud Dataflow monitor:

  1. Click the Summary tab.
  2. Get the Availability Summary Report of the monitor by clicking Availability.
  3. You can also find the Performance Report of the monitor by clicking any chart title.

Related content

Was this document helpful?
Thanks for taking the time to share your feedback. We’ll use your feedback to improve our online help resources.

Help Google Cloud Google Cloud Dataflow