How to Monitor GeminiDB Cassandra in Huawei Cloud
Site24x7 gives your team complete observability into your GeminiDB Cassandra clusters, delivering real-time visibility into CPU and memory utilization, storage capacity, read and write latency percentiles, connection health, pending operations, tombstone query rates, and partition imbalances.
Use cases
- Latency control: Reduce tail latency by tracking the Write P95 Latency, Read P95 Latency, Write Max Latency, and Read Max Latency to identify garbage collection pauses or compaction delays early.
- Drop prevention: Prevent data loss by monitoring the Dropped Mutations and Dropped Reads, ensuring that queues don't overflow and operations aren't discarded.
- Partition health: Maintain the data balance by tracking the Large Partition Count and Imbalanced Table Count signal data modeling issues that compound over time. Find them before they impact performance.
Setup and configuration
GeminiDB Cassandra resources are auto-discovered and monitored during the Huawei Cloud integration. To enable monitoring, follow the steps below:
- Go to Cloud > Huawei > Add Huawei Monitor. Follow the steps to add a Huawei Cloud monitor.
- While adding or editing a Huawei Cloud monitor, select DDS from the Service/Resource Types drop-down and click Save.
- Go to Cloud > Huawei, select the created Huawei monitor, then click GeminiDB Cassandra to view the performance metrics.
Supported metrics
CPU and memory
Metric name | Description | Units |
| CPU Usage | The percentage of CPU capacity currently consumed by the GeminiDB Cassandra instance. | Percentage |
| Memory Usage | The percentage of memory capacity currently consumed by the GeminiDB Cassandra instance. | Percentage |
Storage
Metric name | Description | Units |
| Disk Utilization | The percentage of total disk storage currently consumed by the instance. | Percentage |
| Disk Total Size | The total disk storage capacity provisioned for the instance. | Gigabytes |
| Disk Used Size | The total disk storage currently consumed by the instance. | Gigabytes |
| Data Load Size | The total size of data currently loaded on the Cassandra node. | Bytes |
Network
Metric name | Description | Units |
| Network Output Throughput | The rate of data transmitted out of the instance over the network per second. | Bytes/second |
| Network Input Throughput | The rate of data received by the instance over the network per second. | Bytes/second |
Connections
Metric name | Description | Units |
| Active Connections | The number of active client connections currently established in the Cassandra instance. | Count |
Operations
Metric name | Description | Units |
| Pending Writes | The number of write operations currently queued and waiting to be processed. | Count |
| Pending Reads | The number of read operations currently queued and waiting to be processed. | Count |
| Dropped Mutations | The number of write mutation operations dropped due to internal queue overflows. | Count |
| Dropped Reads | The number of read operations dropped due to internal queue overflows. | Count |
| Tombstone Query Rate | The rate at which queries are scanning tombstone rows per second. | Count/second |
| Single Delete Rate | The rate of single-row delete operations executed per second. | Count/second |
| Range Delete Rate | The rate of range delete operations executed per second. | Count/second |
| Large Row Count | The rate at which large row operations are being processed per second. | Count/second |
| Avg Limit Diff Count | The average difference between the query limit and the number of rows actually returned. | Count |
| Avg Modify Request Size | The average size of modify requests processed by the instance. | Bytes |
| Avg Query Response Size | The average size of query responses returned by the instance. | Bytes |
| Large Partition Count | The number of partitions exceeding the configured large partition threshold. | Count |
| Imbalanced Table Count | The number of tables with a significant data distribution imbalance across nodes. | Count |
Write latency
Metric name | Description | Units |
| Write Latency | The average latency of write operations in the Cassandra instance. | Milliseconds |
| Write Count | The total number of write operations recorded within the monitoring period. | Count |
| Write 1min Rate | The rate of write operations processed per second over the last 1 minute. | Count/second |
| Write P75 Latency | The write latency below which 75% of write operations completed. | Milliseconds |
| Write P95 Latency | The write latency below which 95% of write operations completed. | Milliseconds |
| Write P99 Latency | The write latency below which 99% of write operations completed. | Milliseconds |
| Write P999 Latency | The write latency below which 99.9% of write operations completed. | Milliseconds |
| Write Max Latency | The maximum write latency recorded for a single write operation. | Milliseconds |
Read latency
Metric name | Description | Units |
| Read Latency | The average latency of read operations in the Cassandra instance. | Milliseconds |
| Read Count | The total number of read operations recorded within the monitoring period. | Count |
| Read 1min Rate | The rate of read operations processed per second over the last 1 minute. | Count/second |
| Read P75 Latency | The read latency below which 75% of read operations completed. | Milliseconds |
| Read P95 Latency | The read latency below which 95% of read operations completed. | Milliseconds |
| Read P99 Latency | The read latency below which 99% of read operations completed. | Milliseconds |
| Read P999 Latency | The read latency below which 99.9% of read operations completed. | Milliseconds |
| Read Max Latency | The maximum read latency recorded for a single read operation. | Milliseconds |
Range slice latency
Metric name | Description | Units |
| Range Slice Latency | The average latency of range slice operations in the Cassandra instance. | Milliseconds |
| Range Slice Count | The total number of range slice operations recorded within the monitoring period. | Count |
| Range Slice 1min Rate | The rate of range slice operations processed per second over the last 1 minute. | Count/second |
| Range Slice P75 Latency | The range slice latency below which 75% of operations completed. | Milliseconds |
| Range Slice P95 Latency | The range slice latency below which 95% of operations completed. | Milliseconds |
| Range Slice P99 Latency | The range slice latency below which 99% of operations completed. | Milliseconds |
Distributed file volume
Metric name | Description | Units |
| DFV Write Delay | The write delay introduced by the distributed file volume (DFV) layer for the instance. | Millisecond |
| DFV Read Delay | The read delay introduced by the DFV layer for the instance. | Millisecond |
| Max Sync Delay | The maximum synchronization delay observed across nodes in the Cassandra cluster. | Millisecond |
Threshold configuration
You can configure thresholds and alerts for all GeminiDB Cassandra metrics to detect performance degradation proactively or connection issues.
- Go to Admin > Configuration Profiles > Threshold and Availability.
- Create or edit your Threshold Profile for GeminiDB Cassandra.
- Assign the profile to the respective monitors to trigger alerts.
IT Automation
Use Site24x7's IT Automation to resolve common issues with GeminiDB Cassandra performance:
- Go to Admin > IT Automation Templates. Then, click Add Automation Templates.
- Create an automation rule by selecting the automation Type (e.g., Server reboot, clear queue).
- Map the created rules to the GeminiDB Cassandra, for automatic execution during alerts.
Configuration rules
Use Configuration Rules to simplify bulk setup across GeminiDB Cassandra instances. Automatically assign Threshold Profiles, Notification Profiles, Tags, and Monitor Groups when new monitors are discovered.
