Alibaba Cloud RocketMQ Monitoring Integration
Site24x7 offers comprehensive monitoring for Alibaba Cloud RocketMQ, helping you gain real-time insights into message flow, API performance, consumer activity, and message backlog. With detailed visibility into traffic, latency, and throttling, you can identify delivery delays, detect bottlenecks, and maintain high throughput and reliability across distributed messaging systems. When you integrate your Alibaba Cloud account with Site24x7, all RocketMQ instances are automatically discovered and continuously monitored.
Use cases
- Throughput tracking: Monitor inbound and outbound traffic utilization to ensure optimal broker performance.
- Message latency monitoring: Detect high queue times or consumer lag to prevent delivery delays.
- API performance visibility: Measure send and receive API TPS to optimize client and broker interactions.
- Failure handling: Identify DLQ message trends and throttled requests to improve reliability.
Setup and configuration
- Log in to your Site24x7 account and navigate to Cloud > Alibaba Cloud > Add Monitor.
- In the Edit Alibaba Cloud Monitor page, select RocketMQ from the Service Types list.
- Once added, go to Cloud > Alibaba > RocketMQ to view dashboards and performance metrics.
Supported metrics
Traffic & Throughput Utilization
| Metric name | Description | Unit |
|---|---|---|
| Instance Traffic RX Utilization | The percentage of inbound traffic utilization for the RocketMQ instance. | Percentage |
| Instance Traffic TX Utilization | The percentage of outbound traffic utilization for the RocketMQ instance. | Percentage |
| Instance Traffic RX | The inbound traffic rate of the RocketMQ instance. | Bytes/second |
| Instance Traffic TX | The outbound traffic rate of the RocketMQ instance. | Bytes/second |
| Instance Dropped Traffic RX | The inbound traffic dropped due to throttling or errors. | Bytes/second |
| Instance Dropped Traffic TX | The outbound traffic dropped due to throttling or errors. | Bytes/second |
| Instance Internet Flow Out Bandwidth | The outbound internet bandwidth usage for the instance. | Bytes/second |
API Call & TPS Performance
| Metric name | Description | Unit |
|---|---|---|
| Instance Send API Call TPS | The number of send API calls per second at the instance level. | Count/second |
| Instance Receive API Call TPS | The number of receive API calls per second at the instance level. | Count/second |
| Instance API Call TPS | The total number of API calls per second for the instance. | Count/second |
| Send Message Count per Instance | The total number of messages sent by the instance. | Count |
| Send Message Count per Topic | The number of messages sent per topic. | Count |
| Receive Message Count per Instance | The total number of messages received by the instance. | Count |
| Receive Message Count per Topic | The number of messages received per topic. | Count |
| Receive Message Count per GID | The number of messages received per consumer group (GID). | Count |
| Receive Message Count per GID Topic | The number of messages received per GID and topic. | Count |
Message Backlog & Latency
| Metric name | Description | Unit |
|---|---|---|
| Ready Messages | The total number of ready messages waiting for consumption. | Count |
| Ready Messages per GID Topic | The number of ready messages per GID and topic. | Count |
| Ready Message Queue Time | The average time messages spend in the queue before being consumed. | Milliseconds |
| Ready Message Queue Time per GID Topic | The average queue time for ready messages per GID and topic. | Milliseconds |
| Consumer Lag | The difference between produced and consumed messages for the instance. | Count |
| Consumer Lag per GID Topic | The consumer lag per GID and topic. | Count |
| Consumer Lag Latency per GID | The delay time caused by consumer lag per GID. | Milliseconds |
| Consumer Lag Latency per GID Topic | The delay time caused by consumer lag per GID and topic. | Milliseconds |
Reliability & Failure Handling
| Metric name | Description | Unit |
|---|---|---|
| Send DLQ Message Count per GID | The number of messages sent to the Dead Letter Queue (DLQ) per GID. | Count |
| Send DLQ Message Count per GID Topic | The number of messages sent to the DLQ per GID and topic. | Count |
| Instance Storage Size | The total storage size used by the RocketMQ instance. | Bytes |
| Instance Active Connection | The total number of active connections for the instance. | Count |
Throttling & Resource Limits
| Metric name | Description | Unit |
|---|---|---|
| Throttled Send Requests per Instance | The number of throttled send requests per instance. | Count |
| Throttled Send Requests per Topic | The number of throttled send requests per topic. | Count |
| Throttled Receive Requests per Instance | The number of throttled receive requests per instance. | Count |
| Throttled Receive Requests per GID | The number of throttled receive requests per consumer group (GID). | Count |
| Throttled Receive Requests per GID Topic | The number of throttled receive requests per GID and topic. | Count |
Threshold configuration
- Go to Admin > Configuration Profiles > Threshold and Availability.
- Create or edit a threshold profile for RocketMQ.
- Assign the profile to the respective monitors to trigger alerts.
IT automation
Site24x7's IT Automation tools help with automatically resolving performance degradation issues. When a breach occurs, the alarm engine continuously examines the system events for which thresholds have been defined and performs the mapped automation.
- Go to Admin > IT Automation Templates.
- Create a new automation rule.
- Map the rule to the monitor for proactive resolution.
How to configure IT Automation for a monitor
Configuration rules
With Site24x7's Configuration Rules, you can set parameters like Threshold Profile, Notification Profile, Tags, and Monitor Group for multiple monitors and automate the configuration settings of your monitoring resources. Automatically assign these settings when new RocketMQ monitors are added.
How to add a Configuration Rule
