How to Monitor Distributed Cache Service in Huawei Cloud
Site24x7 offers monitoring support for Huawei Cloud's Distributed Cache Service (DCS). Monitor DCS across memory, connections, commands and replication metrics, providing application and infrastructure teams to have a full visibility into Redis cache health.
Use cases
Prevent cascading failures: When the cache hit rate drops below 80%, Site24x7 alerts you early so you can pre-warm the cache and avoid a surge of misses overwhelming the back-end database.
Protect critical data: A rise in evicted keys along with increasing memory usage triggers an alert, allowing you to scale the DCS instance before important session data is lost.
Improve performance: Detect slow logs and increased command response times, helping DBAs identify and optimize expensive Redis commands before they impact application throughput.
Setup and configuration
DCS resources are auto-discovered and monitored during the Huawei Cloud integration. To enable monitoring, follow these steps:
- Navigate to Cloud > Huawei > Add Huawei Monitor. Follow the steps to add a Huawei Cloud monitor.
- While adding or editing a Huawei Cloud monitor, select DCS from the Service/Resource Types drop-down menu and click Save.
- Go to Cloud > Huawei. Then, select the created Huawei monitor.
- Click DCS to view the performance metrics.
Supported metrics
Client Connections
Metric name |
Description |
Unit |
| Connected Clients | The current number of client connections to the DCS instance. | Count |
| Blocked Clients | The number of clients currently blocked waiting on a blocking command (e.g., BLPOP). | Count |
| Rejected Connections | The total number of connection attempts rejected due to the maximum connection limit being reached. | Count |
| Total Connections Received | The total number of client connections accepted by the DCS instance since startup. | Count |
| Connection Utilization | The percentage of the maximum allowed connections currently in use. | Percentage |
Resource Utilization
Metric name |
Description |
Unit |
| CPU Usage | The instantaneous percentage of CPU resources being consumed by the DCS instance. | Percentage |
| Average CPU Usage | The average CPU utilization over the monitoring period. | Percentage |
| Memory Usage | The percentage of total memory currently in use by the DCS instance. | Percentage |
| Maximum Memory Usage | The peak memory utilization recorded during the monitoring period. | Percentage |
Memory Metrics
Metric name |
Description |
Unit |
| Used Memory | The total amount of memory currently allocated and used by the DCS instance. | Bytes |
| Used Memory - RSS | The resident set size of memory allocated by the OS to the DCS process, including fragmentation. | Bytes |
| Peak Memory Usage | The maximum amount of memory that has ever been consumed by the DCS instance. | Bytes |
| Memory Used by Dataset | The amount of memory used directly to store data. | Bytes |
| Dataset Memory Percentage | The proportion of used memory that is occupied by the actual dataset. | Percentage |
| LUA Script Memory Usage | Memory consumed by loaded LUA scripts. | Bytes |
| Memory Fragmentation Ratio | The ratio of RSS memory to used memory; values significantly above 1.0 indicate fragmentation. | Ratio |
Network and Bandwidth
Metric name |
Description |
Unit |
| Bandwidth Usage | The percentage of the allocated network bandwidth currently utilized. | Percentage |
| Instantaneous Input Bandwidth | The current rate of inbound network traffic to the DCS instance. | KB/second |
| Instantaneous Output Bandwidth | The current rate of outbound network traffic from the DCS instance. | KB/second |
| Total Network Input Bytes | The cumulative volume of data received by the DCS instance since startup. | Bytes |
| Total Network Output Bytes | The cumulative volume of data transmitted by the DCS instance since startup. | Bytes |
Commands and Operations
Metric name |
Description |
Unit |
| Total Commands Processed | The cumulative number of commands processed by the DCS instance since startup. | Count |
| Instantaneous Operations Per Second | The current number of commands being executed per second. | Count |
| Average Command Response Time | The mean time taken to process and respond to commands. | Milliseconds |
| Maximum Command Response Time | The longest command response time observed during the monitoring period. | Milliseconds |
| Maximum Command Delay | The maximum delay experienced by a command from submission to execution. | Milliseconds |
| Read Commands Count | The total number of read commands (e.g., GET, HGET) executed. | Count |
| Average Read Command Response Time | The mean response time for read operations. | Milliseconds |
| Write Commands Count | The total number of write commands (e.g., SET, HSET) executed. | Count |
| Average Write Command Response Time | The mean response time for write operations. | Milliseconds |
Keys and Eviction
Metric name |
Description |
Unit |
| Total Keys | The total number of keys currently stored in the DCS instance across all databases. | Count |
| Evicted Keys | The number of keys that have been automatically removed to free memory according to the eviction policy. | Count |
| Expired Keys | The number of keys that have reached their time-to-live expiry and been removed. | Count |
| Keys with Expiration | The number of keys that have an active TTL expiration set. | Count |
| Cache Hit Rate | The percentage of key lookups that successfully found the requested data in the cache. | Percentage |
| Cache Misses | The total number of key lookups that failed to find the data in cache. | Count |
Pub/Sub and Replication
Metric name |
Description |
Unit |
| Pub/Sub Channels | The number of active pub/sub channels currently subscribed to. | Count |
| Pub/Sub Patterns | The number of active pub/sub pattern subscriptions. | Count |
| Master-Slave Replication Offset | The byte offset difference between master and replica, indicating replication lag. | Bytes |
| Full Synchronizations | The total number of full resync operations performed between master and replica. | Count |
| Node Reboots | The number of times the DCS node has been restarted. | Count |
| Receive Flow Control Events | The count of events where incoming network flow was throttled. | Count |
| Slow Log Present | Indicates whether any slow-executing commands exist in the slow log. | Boolean |
| Slow Log Command Count | The number of commands recorded in the slow log. | Count |
Threshold configuration
You can configure thresholds and alerts for all DCS metrics to proactively detect performance degradation or connection issues.
- Go to Admin > Configuration Profiles > Threshold and Availability.
- Create or edit your Threshold Profile for DCS.
- Assign the profile to the respective monitors to trigger alerts.
IT Automation
Use Site24x7's IT Automationto resolve common issues with DCS performance automatically:
- Go to Admin >IT Automation Templates. Then, click Add Automation Templates.
- Create an automation rule by selecting the automation Type (e.g., Server reboot, clear queue).
- Map the created rules to the DCS, for automatic execution during alerts.
Configuration rules
Use Configuration Rules to simplify bulk setup across DCS instances. Automatically assign Threshold Profiles, Notification Profiles, Tags, and Monitor Groups when new monitors are discovered.
