Alibaba Cloud Elastic Compute Service (ECS) Monitoring Integration
Site24x7 offers comprehensive out-of-the-box monitoring for Elastic Compute Service (ECS) instances in your Alibaba Cloud environment. Monitor system-level performance using real-time metrics for CPU usage, memory consumption, disk I/O, network activity, GPU utilization, and process behavior. Once your Alibaba account is integrated with Site24x7, all associated ECS instances are auto-discovered and monitored.
Use cases
- Instance-level health tracking: Monitor CPU, memory, and disk usage to prevent resource exhaustion.
- Disk and network I/O visibility: Identify bottlenecks in storage and data transfer throughput.
- GPU monitoring for ML workloads: Track GPU utilization and temperature to manage compute-intensive applications.
- Proactive alerting: Detect anomalies like packet drops, high system load, or excessive process counts in real time.
Setup and configuration
- Log in to your Site24x7 account and navigate to Cloud > Alibaba Cloud > Add Monitor.
- In the Edit Alibaba Cloud Monitor page, select ECS from the Service Types list.
- Once added, go to Cloud > Alibaba > ECS to view dashboards and performance metrics.
Supported metrics
CPU Metrics
| Metric name | Description | Unit |
|---|---|---|
| CPU Utilization | The percentage of total CPU capacity in use. | Percentage |
| CPU User Time | The percentage of CPU used by user processes. | Percentage |
| CPU System Time | The percentage of CPU used by system/kernel processes. | Percentage |
| CPU Idle Time | The percentage of idle CPU time. | Percentage |
| CPU Wait Time | The percentage of time the CPU spends waiting on I/O. | Percentage |
| Total CPU Usage | The total CPU usage across all cores. | Percentage |
| Load Average (1 Minute) | The average system load over the last 1 minute. | Load |
| Load Average (5 Minutes) | The average system load over the last 5 minutes. | Load |
| Load Average (15 Minutes) | The average system load over the last 15 minutes. | Load |
| Load Average Per Core (1 Minute) | The 1-minute load average per CPU core. | Load |
Memory Metrics
| Metric name | Description | Unit |
|---|---|---|
| VM Memory Utilization | The percentage of memory in use. | Percentage |
| Memory Used Utilization | The percentage of used memory relative to total. | Percentage |
| Memory Used Space | The amount of used memory. | MB |
| Memory Free Utilization | The percentage of free memory available. | Percentage |
| Memory Free Space | The amount of free memory. | MB |
| Total Memory Space | The total memory available on the instance. | MB |
Disk Metrics
| Metric name | Description | Unit |
|---|---|---|
| Disk Read Throughput (Bps) | The rate of data read from disk. | Bytes/second |
| Disk Write Throughput (Bps) | The rate of data written to disk. | Bytes/second |
| Disk Read IOPS | The number of read operations per second. | Ops/second |
| Disk Write IOPS | The number of write operations per second. | Ops/second |
| Disk Usage Utilization | The percentage of disk space used. | Percentage |
| Disk Usage (Used) | The amount of disk space used. | GB |
| Disk I/O Queue Size | The number of disk I/O requests waiting in queue. | Count |
| Disk Read Throughput Utilization | The percentage of read throughput used. | Percentage |
| Disk Write Throughput Utilization | The percentage of write throughput used. | Percentage |
Network Metrics
| Metric name | Description | Unit |
|---|---|---|
| Network In Rate | The rate of incoming network traffic. | Bytes/second |
| Network Out Rate | The rate of outgoing network traffic. | Bytes/second |
| Network In Packets | The number of incoming packets per second. | Packets/second |
| Network Out Packets | The number of outgoing packets per second. | Packets/second |
| Dropped Packets Percentage (In) | The percentage of incoming packets dropped. | Percentage |
| Dropped Packets Percentage (Out) | The percentage of outgoing packets dropped. | Percentage |
System and Process Metrics
| Metric name | Description | Unit |
|---|---|---|
| Status Check | The overall system health check result. | Text |
| Status Check (Instance) | The number of system-level health check attempts. | Count |
| Process Count | The number of processes running. | Count |
| VM Process Count | The number of virtual machine processes. | Count |
| Concurrent Connections | The number of concurrent network connections. | Count |
GPU Metrics
| Metric name | Description | Unit |
|---|---|---|
| GPU Memory Used Utilization | The percentage of GPU memory in use. | Percentage |
| GPU Utilization | The percentage of GPU compute usage. | Percentage |
| Instance GPU Temperature | The current GPU temperature. | Celsius |
| Instance GPU Memory Used Utilization | The percentage of memory used by the GPU on the instance. | Percentage |
Threshold configuration
- Go to Admin > Configuration Profiles > Threshold and Availability.
- Create or edit a threshold profile for ECS.
- Assign the profile to the respective monitors to trigger alerts.
IT automation
Site24x7's IT Automation tools help with automatically resolving performance degradation issues. When a breach occurs, the alarm engine continuously examines the system events for which thresholds have been defined and performs the mapped automation.
- Go to Admin > IT Automation Templates.
- Create a new automation rule.
- Map the rule to the monitor for proactive resolution.
How to configure IT Automation for a monitor
Configuration rules
With Site24x7's Configuration Rules, you can set parameters like Threshold Profile, Notification Profile, Tags, and Monitor Group for multiple monitors and automate the configuration settings of your monitoring resources. Automatically assign these settings when new ECS monitors are added.
How to add a Configuration Rule
