Alibaba Cloud SelectDB Monitoring Integration
Site24x7 offers out-of-the-box monitoring support for SelectDB, a high-performance analytical database engine in Alibaba Cloud. With support for real-time metrics collection, Site24x7 helps you stay on top of query execution times, compaction efficiency, cluster health, and resource usage at the pod level. Once your Alibaba account is integrated with Site24x7, SelectDB clusters are auto-discovered and monitored continuously.
Use cases
- Query latency tracking: Detect performance degradation using P99 and average latency metrics.
- Load pipeline observability: Monitor all stages of load jobs and identify bottlenecks.
- Pod-level diagnostics: Resolve issues caused by CPU, memory, or disk IOPS spikes.
- Cluster health insights: Take proactive action on failed nodes or compaction inefficiencies.
Setup and configuration
- Log in to your Site24x7 account and navigate to Cloud > Alibaba Cloud > Add Monitor.
- In the Edit Alibaba Cloud Monitor page, select SelectDB from the Service Types list.
- Once added, go to Cloud > Alibaba > SelectDB to view dashboards and performance metrics.
Supported metrics
Query Performance & Latency
| Metric name | Description | Unit |
|---|---|---|
| Cluster Query QPS | The number of queries processed per second by the SelectDB cluster. | Queries/second |
| Average Cluster Query Latency | The average time taken to execute queries. | Milliseconds |
| Cluster Query Latency (P99) | The 99th percentile latency for query execution. | Milliseconds |
| Cluster Query Success Rate | The percentage of successfully executed queries. | Percentage |
| Instance Connection Count | The number of active client connections to the instance. | Count |
Data Compaction & Ingestion
| Metric name | Description | Unit |
|---|---|---|
| Cluster Data Load Rate | The rate at which data is loaded into the cluster. | Rows/second |
| Cluster Data Compaction Base Score | Base-level score indicating compaction efficiency. | Score |
| Cluster Data Compaction Cumulative Score | Accumulated compaction score indicating overall health. | Score |
Pod-Level Resource Usage
| Metric name | Description | Unit |
|---|---|---|
| Pod CPU Utilization | The percentage of CPU utilized by the pod. | Percentage |
| Pod Memory Utilization | The percentage of memory utilized by the pod. | Percentage |
| Pod Memory Usage | The actual memory used by the pod. | MB |
| Pod Disk IOPS (Write) | The number of disk write operations performed per second by the pod. | IOPS |
| Pod Disk IOPS (Read) | The number of disk read operations performed per second by the pod. | IOPS |
Load Job Metrics
| Metric name | Description | Unit |
|---|---|---|
| Load Job Count (Broker — Loading) | The number of broker load jobs currently in progress. | Count |
| Load Job Count (Broker — Committed) | The number of broker load jobs that have been committed. | Count |
| Load Job Count (Broker — Pending) | The number of broker load jobs pending execution. | Count |
| Insert Load Job Rate (Loading) | Insert load job rate while loading data. | Rows/second |
| Insert Load Job Rate (Committed) | Insert load job rate for committed data. | Rows/second |
| Stream Load Job Rate | Data ingestion rate through stream load jobs. | Rows/second |
| Load Job Count (Broker — Finished) | The number of broker load jobs that have finished execution. | Count |
Cache Efficiency
| Metric name | Description | Unit |
|---|---|---|
| Cluster Cache Hit Rate | The percentage of cache hits during read operations. | Percentage |
| Cluster Cache I/O (Read) | The number of I/O read operations handled via the cache. | Operations/second |
| Cluster Cache I/O (Write) | The number of I/O write operations handled via the cache. | Operations/second |
Cluster Health
| Metric name | Description | Unit |
|---|---|---|
| Cluster Failed Node Count | The number of failed or unresponsive nodes in the cluster. | Count |
| Remote Storage I/O (Read) | The rate of data read from remote storage. | MB/second |
Threshold configuration
- Go to Admin > Configuration Profiles > Threshold and Availability.
- Create or edit a threshold profile for SelectDB.
- Assign the profile to the respective monitors to trigger alerts.
IT automation
Site24x7's IT Automation tools help with automatically resolving performance degradation issues. When a breach occurs, the alarm engine continuously examines the system events for which thresholds have been defined and performs the mapped automation.
- Go to Admin > IT Automation Templates.
- Create a new automation rule.
- Map the rule to the monitor for proactive resolution.
How to configure IT Automation for a monitor
Configuration rules
With Site24x7's Configuration Rules, you can set parameters like Threshold Profile, Notification Profile, Tags, and Monitor Group for multiple monitors and automate the configuration settings of your monitoring resources. Automatically assign these settings when new SelectDB monitors are added.
How to add a Configuration Rule
