Alibaba Cloud Kafka Monitoring Integration
Site24x7 provides comprehensive out-of-the-box monitoring support for Alibaba Cloud Kafka. By observing metrics such as message input/output, accumulation, latency, network utilization, and disk usage, you can gain real-time insights into Kafka cluster performance, client behavior, and broker efficiency. Once you integrate your Alibaba Cloud account with Site24x7, all Kafka instances are auto-discovered and monitored.
Use cases
- Track producer/consumer throughput: Monitor input/output rates at the instance, topic, and group levels to analyze data flow.
- Detect message backlog: Get alerted when message accumulation grows abnormally across clusters or specific topics.
- Monitor latency and throttling: Identify slowdowns in request processing or broker-side throttling.
- Ensure resource utilization health: Analyze disk usage, batch sizes, and connection load to avoid system saturation.
- Spot networking bottlenecks: Track network I/O rates and utilization by node for smoother data transport.
Setup and configuration
- Log in to your Site24x7 account and navigate to Cloud > Alibaba Cloud > Add Monitor.
- In the Edit Alibaba Cloud Monitor page, select Kafka from the Service Types list.
- Once added, go to Cloud > Alibaba > Kafka to view dashboards and performance metrics.
Supported metrics
Message Input and Output
| Metric name | Description | Unit |
|---|---|---|
| Instance Message Input (v3) | The number of messages produced to the instance. | Count/second |
| Instance Message Output (v3) | The number of messages consumed from the instance. | Count/second |
| Instance Message Input Ratio (v3) | The rate of message input for the instance. | Percentage |
| Instance Message Output Ratio (v3) | The rate of message output for the instance. | Percentage |
| Cluster Message Input (v3) | The total message input across the Kafka cluster. | Count/second |
| Group Message Output Count (v3) | The number of messages consumed by a specific group. | Count/second |
| Topic Message Input Count (v3) | The number of messages produced to a topic. | Count/second |
| Topic Message Output Count (v3) | The number of messages consumed from a topic. | Count/second |
Message Accumulation
| Metric name | Description | Unit |
|---|---|---|
| Message Accumulation (v3) | The number of unconsumed messages in the cluster. | Count |
| Message Accumulation | Total backlog of messages awaiting consumption. | Count |
| Message Accumulation (Single Topic) | Message backlog for a single topic. | Count |
Request and Processing
| Metric name | Description | Unit |
|---|---|---|
| Instance Requests Input (v3) | Number of incoming requests to the instance. | Count/second |
| Instance Requests Output (v3) | Number of outgoing responses from the instance. | Count/second |
| Topic Requests Input (v3) | Number of incoming requests targeting a specific topic. | Count/second |
| Topic Requests Output (v3) | Number of outgoing responses from a specific topic. | Count/second |
Latency and Throttling
| Metric name | Description | Unit |
|---|---|---|
| Instance Throttle Time P99 (Input, v3) | 99th percentile of input throttling time. | Milliseconds |
| Instance Throttle Time P99 (Output, v3) | 99th percentile of output throttling time. | Milliseconds |
| Instance Fetch Throttle Queue Size (v2) | Fetch throttle queue size on the instance. | Count |
| Instance Produce Throttle Queue Size (v2) | Produce throttle queue size on the instance. | Count |
| Instance Batch Size (TP50, v2) | Median batch size of producer messages. | Bytes |
| Instance Batch Size (TP999, v2) | 99.9th percentile batch size of producer messages. | Bytes |
Network
| Metric name | Description | Unit |
|---|---|---|
| Instance Internet Receive Rate (v3) | Rate of incoming network traffic to the instance. | Bytes/second |
| Instance Internet Transmit Rate (v3) | Rate of outgoing network traffic from the instance. | Bytes/second |
| Instance Internet Receive Utilization (By Node) | Network receive utilization by node. | Percentage |
| Instance Internet Transmit Utilization (By Node) | Network transmit utilization by node. | Percentage |
Disk and Storage
| Metric name | Description | Unit |
|---|---|---|
| Instance Disk Capacity | The total disk capacity allocated to the instance. | GB |
| Instance Disk Log Size (v3) | The size of Kafka log files on disk. | GB |
Connections
| Metric name | Description | Unit |
|---|---|---|
| Instance Maximum Connection Count (v3) | The maximum number of concurrent connections allowed. | Count |
| Instance Total Connection Count (v3) | The total number of active client connections. | Count |
Threshold configuration
- Go to Admin > Configuration Profiles > Threshold and Availability.
- Create or edit a threshold profile for Kafka.
- Assign the profile to the respective monitors to trigger alerts.
IT automation
Site24x7's IT Automation tools help with automatically resolving performance degradation issues. When a breach occurs, the alarm engine continuously examines the system events for which thresholds have been defined and performs the mapped automation.
- Go to Admin > IT Automation Templates.
- Create a new automation rule.
- Map the rule to the monitor for proactive resolution.
How to configure IT Automation for a monitor
Configuration rules
With Site24x7's Configuration Rules, you can set parameters like Threshold Profile, Notification Profile, Tags, and Monitor Group for multiple monitors and automate the configuration settings of your monitoring resources. Automatically assign these settings when new Kafka monitors are added.
How to add a Configuration Rule
