How to monitor GaussDB openGauss in Huawei Cloud

Site24x7 provides comprehensive monitoring for openGauss on GaussDB, including CPU usage, session activity, SQL performance, WAL logs, and disaster recovery replication.

This solution equips enterprise database administrators (DBAs) with full-stack visibility, enabling them to effectively maintain service-level agreements (SLAs) for critical online transaction processing (OLTP) workloads.

Use cases

Idle transaction detection: Alerts for idle-in-transaction sessions help prevent lock accumulation, which can degrade concurrent query throughput.

Disaster recovery compliance: Monitoring shared recovery point objective (RPO) and recovery time objective (RTO) ensures adherence to regulatory recovery objectives.

WAL space management: Alerts for inactive replication slots and transaction log (xlog) size prevent excessive WAL log accumulation, safeguarding disk space.

Setup and configuration

GaussDB openGauss resources are auto-discovered and monitored during the Huawei Cloud integration. To enable monitoring, follow the steps below:

Navigate to Cloud > Huawei > Add Huawei Monitor. Follow the steps to add a Huawei Cloud monitor.
While adding or editing a Huawei Cloud monitor, select GaussDB openGauss from the Service/Resource Types drop-down and click Save.
Navigate to Cloud > Huawei, select the created Huawei monitor, and then click GaussDB openGauss to view the performance metrics.

Supported metrics

CPU and memory

Metric name	Description	Units
CPU Usage	Overall percentage of CPU capacity currently consumed by the GaussDB V5 instance.	Percentage
Memory Usage	Percentage of total memory capacity currently in use by the GaussDB V5 instance.	Percentage
User-mode CPU Time Percentage	Proportion of CPU time spent executing user-space database processes.	Percentage
Kernel-mode CPU Time Percentage	Proportion of CPU time spent executing kernel-space operations on behalf of the database.	Percentage
Disk I/O Wait Time Percentage	Proportion of CPU time spent waiting for disk I/O operations to complete.	Percentage
Swap Memory Usage	Percentage of total swap space currently in use by the instance.	Percentage
Total Swap Memory	Total amount of swap space configured and available to the instance.	MB

Network

Metric name	Description	Units
Data Write Volume	Volume of data received by the instance from clients per second.	Byte
Outgoing Data Volume	Volume of data sent by the instance to clients per second.	Byte
Retransmission Ratio	Proportion of TCP segments that are retransmitted, indicating network reliability issues.	Percentage

Disk and I/O

Metric name	Description	Units
Disk IOPS	Total number of read and write I/O operations completed per second on the instance storage.	IOPS
Disk Write Throughput	Volume of data written to disk per second by the instance.	KB/second
Disk Read Throughput	Volume of data read from disk per second by the instance.	KB/second
Average Time per Disk Write	Average time in milliseconds needed to complete a single disk write operation.	Milliseconds
Average Time per Disk Read	Average time in milliseconds needed to complete a single disk read operation.	Milliseconds
Disk I/O Bandwidth Usage	Percentage of total available disk I/O bandwidth being consumed.	Percentage
IOPS Usage	Percentage of the provisioned IOPS limit currently in use by the instance.	Percentage

Instance and disk

Metric name	Description	Units
Used Instance Disk Size	Amount of disk storage currently consumed by instance-level data files.	MB
Total Instance Disk Size	Total disk capacity allocated to the GaussDB V5 instance.	MB
Instance Disk Usage	Percentage of the total instance disk capacity currently in use.	Percentage
Buffer Hit Rate	Percentage of data block read requests served from the shared buffer cache without requiring a disk read.	Percentage
Deadlocks	Number of deadlock events detected and resolved by the database engine in the monitoring period.	Count
Response Time of 80% SQL Statements	Latency value below which 80% of all SQL statement executions completed (P80).	Milliseconds
Response Time of 95% SQL Statements	Latency value below which 95% of all SQL statement executions completed (P95).	Milliseconds
System Database Size	Total size of all system database schemas on the instance.	MB
User Database Total Size	Combined size of all user-created databases on the instance.	MB

Component and disk

Metric name	Description	Units
Used Disk Size	Amount of disk storage consumed by component-level data (separate from instance-level accounting).	MB
Total Disk Size	Total disk capacity available at the component level.	MB
Disk Usage	Percentage of component-level disk capacity currently in use.	Percentage

Replication

Metric name	Description	Units
Primary Node Flow Control Duration	Duration during which the primary node is applying flow control to limit replication write speed.	Milliseconds
Standby RTO Duration	Estimated RTO for the standby node based on current redo progress.	Milliseconds
Standby Node Redo Progress	Measure of how far the standby node has progressed in replaying redo logs relative to the primary.	Byte
WAL Log Size in Replication Slot	Total size of WAL log data retained in active replication slots pending consumption by replicas.	Byte
Xlog Rate	Rate at which xlog sequence numbers are advancing on the primary node.	Byte
Shard Log Gap of DR Cluster	Difference in log volume between the primary shard and the disaster recovery cluster.	Byte
Size of Shard Logs to Be Replayed	Volume of shard logs on the DR cluster that have been received but not yet replayed.	Byte
Flushing Rate of Shard Logs in DR	Rate at which shard logs are being flushed to disk in the disaster recovery cluster.	Byte/second
Replay Rate of Shard Logs in DR	Rate at which shard logs are being replayed by the disaster recovery cluster.	Byte/second
Shard RPO	RPO for the shard, reflecting the maximum potential data loss in the event of failover.	Seconds
Shard RTO	RTO for the shard, reflecting the estimated time to recover after a failure.	Milliseconds
Inactive Replication Slots	Number of replication slots that are no longer being consumed, causing WAL log accumulation on the primary.	Count
Size of Read Replica Logs Not Replayed	Volume of logs received by read replica nodes that have not yet been applied.	MB
Difference Between Redo and Receipt Positions	Gap between the volume of logs received and the volume of logs replayed on the standby node.	MB

Sessions

Metric name	Description	Units
User Logins per Second	Rate at which new user login events are occurring on the instance per second.	Count/second
User Logouts per Second	Rate at which user session logout events are occurring on the instance per second.	Count/second
Lock Waiting Session Ratio	Proportion of active sessions currently blocked waiting to acquire a lock.	Percentage
Active Session Rate	Proportion of total sessions that are actively executing queries.	Percentage
CN Connections	Number of connections currently established to the coordinator node.	Count
Online Sessions	Total number of user sessions currently connected to the instance.	Count
Active Sessions	Number of sessions currently executing a query or transaction.	Count
Online Session Rate	Proportion of the maximum session limit currently occupied by connected sessions.	Percentage
Sessions Waiting for Locks	Number of sessions currently blocked waiting to acquire a database lock.	Count
Waiting Sessions	Total number of sessions in any waiting state, including lock waits and I/O waits.	Count
Online Clients	Number of distinct client addresses with active connections to the instance.	Count
Active Clients	Number of distinct client addresses that have executed at least one query recently.	Count
Login Attempts with Incorrect Passwords	Number of authentication failures due to incorrect passwords in the monitoring period.	Count

Transactions

Metric name	Description	Units
User Committed Transactions per Second	Rate of user-initiated transactions successfully committed per second.	Count/second
User Rollback Transactions per Second	Rate of user-initiated transactions rolled back per second.	Count/second
Background Committed Transactions per Second	Rate of background process transactions committed per second, such as autovacuum and checkpoint work.	Count/second
Background Rollback Transactions per Second	Rate of background process transactions rolled back per second.	Count/second
Average Response Time of User Transactions	Mean time in milliseconds taken to complete a user-initiated transaction from start to commit or rollback.	Milliseconds
User Transaction Rollback Rate	Proportion of user transactions that ended in a rollback rather than a commit.	Percentage
Background Transaction Rollback Rate	Proportion of background transactions that ended in a rollback.	Percentage
Maximum Execution Duration of DB Transactions	Elapsed time of the longest currently running transaction on the instance.	Milliseconds
Idle Transactions	Number of transactions that are open but currently idle, holding locks or resources without executing.	Count
Oldest Two-Phase Commit Transaction Duration	Age of the oldest prepared (two-phase commit) transaction still pending resolution.	Milliseconds

SQL

Metric name	Description	Units
Data Definition Language/s	Rate of DDL statements (CREATE, ALTER, DROP) executed per second.	Count/second
Data Manipulation Language/s	Rate of DML statements (INSERT, UPDATE, DELETE, SELECT) executed per second.	Count/second
Data Control Language/s	Rate of DCL statements (GRANT, REVOKE) executed per second.	Count/second
DDL and DCL Rate	Combined rate of DDL and DCL statements as a proportion of total SQL activity.	Percentage
Slow SQL Statements in System Database	Number of slow-running SQL statements detected in system database schemas.	Count
Slow SQL Statements in User Database	Number of slow-running SQL statements detected in user-created database schemas.	Count
SELECT Distribution	Distribution of SELECT statement execution across nodes or time, showing read workload spread.	Count
UPDATE Distribution	Distribution of UPDATE statement execution across nodes or time, showing write workload spread.	Count
INSERT Distribution	Distribution of INSERT statement execution across nodes or time.	Count
DELETE Distribution	Distribution of DELETE statement execution across nodes or time.	Count
SQL Statements	Total count of SQL statements executed in the monitoring period.	Count
Read Requests	Total number of read (SELECT) requests processed by the instance per second.	Count/second
INSERT Request Response Time	Average response time in milliseconds for INSERT statement execution.	Milliseconds
UPDATE Request Response Time	Average response time in milliseconds for UPDATE statement execution.	Milliseconds
DELETE Request Response Time	Average response time in milliseconds for DELETE statement execution.	Milliseconds
Read Request Response Time	Average response time in milliseconds for SELECT statement execution.	Milliseconds

Resources

Metric name	Description	Units
Used Dynamic Memory	Amount of dynamic memory currently allocated for query execution and other runtime operations.	MB
Dynamic Memory Usage	Percentage of the dynamic memory pool currently consumed by active operations.	Percentage
Used Memory	Total process memory currently consumed by the GaussDB V5 database process.	MB
Thread Pool Usage	Percentage of the thread pool capacity currently occupied by active worker threads.	Percentage

Xlog

Metric name	Description	Units
Data Volume to Be Flushed to Disks	Amount of dirty data in memory that has not yet been written to disk by the checkpoint process.	MB
Physical Reads per Second	Number of physical disk read operations performed by the database engine per second.	Count/second
Physical Writes per Second	Number of physical disk write operations performed by the database engine per second.	Count/second
Xlogs	Total number of xlog files currently present on the instance.	Count
Xlog Size	Total disk space consumed by xlog files on the instance.	MB
CN Temporary Directory Size	Disk space consumed by temporary files in the coordinator node's temporary directory.	MB

Status

Metric name	Description	Units
DN Status	Health and operational status of the Data Node instances in the GaussDB V5 cluster.	Status
Replication Slot Directory Size	Total disk space consumed by all replication slot directories on the primary node.	MB
Heap and Index Tuple Inconsistencies	Number of detected inconsistencies between heap table tuples and their corresponding index entries, indicating data integrity issues.	Count

Threshold configuration

You can configure thresholds and alerts for all GaussDB openGauss metrics to detect performance degradation proactively or connection issues.

Go to Admin > Configuration Profiles > Threshold and Availability.
Create or edit your Threshold Profile for GaussDB openGauss.
Assign the profile to the respective monitors to trigger alerts.

IT Automation

Use Site24x7's IT Automation to resolve common issues with GaussDB openGauss performance:

Go to Admin >IT Automation Templates. Then, click Add Automation Templates.
Create an automation rule by selecting the automation Type (e.g., Server reboot, clear queue).
Map the created rules to the GaussDB openGauss, for automatic execution during alerts.

Configuration rules

Use Configuration Rules to simplify bulk setup across GaussDB openGauss instances. Automatically assign Threshold Profiles, Notification Profiles, Tags, and Monitor Groups when new monitors are discovered.

On this page

Use cases

Setup and configuration

Supported metrics

Threshold configuration

IT Automation

Configuration rules

How to monitor GaussDB openGauss in Huawei Cloud

Use cases

Setup and configuration

Supported metrics

CPU and memory

Network

Disk and I/O

Instance and disk

Component and disk

Replication

Sessions

Transactions

SQL

Resources

Xlog

Status

Threshold configuration

IT Automation

Configuration rules

Related articles