Help Docs

How to monitor GaussDB openGauss in Huawei Cloud

Site24x7 provides comprehensive monitoring for openGauss on GaussDB, including CPU usage, session activity, SQL performance, WAL logs, and disaster recovery replication.

This solution equips enterprise database administrators (DBAs) with full-stack visibility, enabling them to effectively maintain service-level agreements (SLAs) for critical online transaction processing (OLTP) workloads.

Use cases

Idle transaction detection: Alerts for idle-in-transaction sessions help prevent lock accumulation, which can degrade concurrent query throughput.

Disaster recovery compliance: Monitoring shared recovery point objective (RPO) and recovery time objective (RTO) ensures adherence to regulatory recovery objectives.

WAL space management: Alerts for inactive replication slots and transaction log (xlog) size prevent excessive WAL log accumulation, safeguarding disk space.

Setup and configuration

GaussDB openGauss resources are auto-discovered and monitored during the Huawei Cloud integration. To enable monitoring, follow the steps below:

  1. Navigate to Cloud > Huawei > Add Huawei Monitor. Follow the steps to add a Huawei Cloud monitor.
  2. While adding or editing a Huawei Cloud monitor, select GaussDB openGauss from the Service/Resource Types drop-down and click Save.
  3. Navigate to Cloud > Huawei, select the created Huawei monitor, and then click GaussDB openGauss to view the performance metrics.

Supported metrics

CPU and memory

Metric name

Description

Units

CPU UsageOverall percentage of CPU capacity currently consumed by the GaussDB V5 instance.Percentage
Memory UsagePercentage of total memory capacity currently in use by the GaussDB V5 instance.Percentage
User-mode CPU Time PercentageProportion of CPU time spent executing user-space database processes.Percentage
Kernel-mode CPU Time PercentageProportion of CPU time spent executing kernel-space operations on behalf of the database.Percentage
Disk I/O Wait Time PercentageProportion of CPU time spent waiting for disk I/O operations to complete.Percentage
Swap Memory UsagePercentage of total swap space currently in use by the instance.Percentage
Total Swap MemoryTotal amount of swap space configured and available to the instance.MB

Network

Metric name

Description

Units

Data Write VolumeVolume of data received by the instance from clients per second.Byte
Outgoing Data VolumeVolume of data sent by the instance to clients per second.Byte
Retransmission RatioProportion of TCP segments that are retransmitted, indicating network reliability issues.Percentage

Disk and I/O

Metric name

Description

Units

Disk IOPSTotal number of read and write I/O operations completed per second on the instance storage.IOPS
Disk Write ThroughputVolume of data written to disk per second by the instance.KB/second
Disk Read ThroughputVolume of data read from disk per second by the instance.KB/second
Average Time per Disk WriteAverage time in milliseconds needed to complete a single disk write operation.Milliseconds
Average Time per Disk ReadAverage time in milliseconds needed to complete a single disk read operation.Milliseconds
Disk I/O Bandwidth UsagePercentage of total available disk I/O bandwidth being consumed.Percentage
IOPS UsagePercentage of the provisioned IOPS limit currently in use by the instance.Percentage

Instance and disk

Metric name

Description

Units

Used Instance Disk SizeAmount of disk storage currently consumed by instance-level data files.MB
Total Instance Disk SizeTotal disk capacity allocated to the GaussDB V5 instance.MB
Instance Disk UsagePercentage of the total instance disk capacity currently in use.Percentage
Buffer Hit RatePercentage of data block read requests served from the shared buffer cache without requiring a disk read.Percentage
DeadlocksNumber of deadlock events detected and resolved by the database engine in the monitoring period.Count
Response Time of 80% SQL StatementsLatency value below which 80% of all SQL statement executions completed (P80).Milliseconds
Response Time of 95% SQL StatementsLatency value below which 95% of all SQL statement executions completed (P95).Milliseconds
System Database SizeTotal size of all system database schemas on the instance.MB
User Database Total SizeCombined size of all user-created databases on the instance.MB

Component and disk

Metric name

Description

Units

Used Disk SizeAmount of disk storage consumed by component-level data (separate from instance-level accounting).MB
Total Disk SizeTotal disk capacity available at the component level.MB
Disk UsagePercentage of component-level disk capacity currently in use.Percentage

Replication

Metric name

Description

Units

Primary Node Flow Control DurationDuration during which the primary node is applying flow control to limit replication write speed.Milliseconds
Standby RTO DurationEstimated RTO for the standby node based on current redo progress.Milliseconds
Standby Node Redo ProgressMeasure of how far the standby node has progressed in replaying redo logs relative to the primary.Byte
WAL Log Size in Replication SlotTotal size of WAL log data retained in active replication slots pending consumption by replicas.Byte
Xlog RateRate at which xlog sequence numbers are advancing on the primary node.Byte
Shard Log Gap of DR ClusterDifference in log volume between the primary shard and the disaster recovery cluster.Byte
Size of Shard Logs to Be ReplayedVolume of shard logs on the DR cluster that have been received but not yet replayed.Byte
Flushing Rate of Shard Logs in DRRate at which shard logs are being flushed to disk in the disaster recovery cluster.Byte/second
Replay Rate of Shard Logs in DRRate at which shard logs are being replayed by the disaster recovery cluster.Byte/second
Shard RPORPO for the shard, reflecting the maximum potential data loss in the event of failover.Seconds
Shard RTORTO for the shard, reflecting the estimated time to recover after a failure.Milliseconds
Inactive Replication SlotsNumber of replication slots that are no longer being consumed, causing WAL log accumulation on the primary.Count
Size of Read Replica Logs Not ReplayedVolume of logs received by read replica nodes that have not yet been applied.MB
Difference Between Redo and Receipt PositionsGap between the volume of logs received and the volume of logs replayed on the standby node.MB

Sessions

Metric name

Description

Units

User Logins per SecondRate at which new user login events are occurring on the instance per second.Count/second
User Logouts per SecondRate at which user session logout events are occurring on the instance per second.Count/second
Lock Waiting Session RatioProportion of active sessions currently blocked waiting to acquire a lock.Percentage
Active Session RateProportion of total sessions that are actively executing queries.Percentage
CN ConnectionsNumber of connections currently established to the coordinator node.Count
Online SessionsTotal number of user sessions currently connected to the instance.Count
Active SessionsNumber of sessions currently executing a query or transaction.Count
Online Session RateProportion of the maximum session limit currently occupied by connected sessions.Percentage
Sessions Waiting for LocksNumber of sessions currently blocked waiting to acquire a database lock.Count
Waiting SessionsTotal number of sessions in any waiting state, including lock waits and I/O waits.Count
Online ClientsNumber of distinct client addresses with active connections to the instance.Count
Active ClientsNumber of distinct client addresses that have executed at least one query recently.Count
Login Attempts with Incorrect PasswordsNumber of authentication failures due to incorrect passwords in the monitoring period.Count

Transactions

Metric name

Description

Units

User Committed Transactions per SecondRate of user-initiated transactions successfully committed per second.Count/second
User Rollback Transactions per SecondRate of user-initiated transactions rolled back per second.Count/second
Background Committed Transactions per SecondRate of background process transactions committed per second, such as autovacuum and checkpoint work.Count/second
Background Rollback Transactions per SecondRate of background process transactions rolled back per second.Count/second
Average Response Time of User TransactionsMean time in milliseconds taken to complete a user-initiated transaction from start to commit or rollback.Milliseconds
User Transaction Rollback RateProportion of user transactions that ended in a rollback rather than a commit.Percentage
Background Transaction Rollback RateProportion of background transactions that ended in a rollback.Percentage
Maximum Execution Duration of DB TransactionsElapsed time of the longest currently running transaction on the instance.Milliseconds
Idle TransactionsNumber of transactions that are open but currently idle, holding locks or resources without executing.Count
Oldest Two-Phase Commit Transaction DurationAge of the oldest prepared (two-phase commit) transaction still pending resolution.Milliseconds

SQL

Metric name

Description

Units

Data Definition Language/sRate of DDL statements (CREATE, ALTER, DROP) executed per second.Count/second
Data Manipulation Language/sRate of DML statements (INSERT, UPDATE, DELETE, SELECT) executed per second.Count/second
Data Control Language/sRate of DCL statements (GRANT, REVOKE) executed per second.Count/second
DDL and DCL RateCombined rate of DDL and DCL statements as a proportion of total SQL activity.Percentage
Slow SQL Statements in System DatabaseNumber of slow-running SQL statements detected in system database schemas.Count
Slow SQL Statements in User DatabaseNumber of slow-running SQL statements detected in user-created database schemas.Count
SELECT DistributionDistribution of SELECT statement execution across nodes or time, showing read workload spread.Count
UPDATE DistributionDistribution of UPDATE statement execution across nodes or time, showing write workload spread.Count
INSERT DistributionDistribution of INSERT statement execution across nodes or time.Count
DELETE DistributionDistribution of DELETE statement execution across nodes or time.Count
SQL StatementsTotal count of SQL statements executed in the monitoring period.Count
Read RequestsTotal number of read (SELECT) requests processed by the instance per second.Count/second
INSERT Request Response TimeAverage response time in milliseconds for INSERT statement execution.Milliseconds
UPDATE Request Response TimeAverage response time in milliseconds for UPDATE statement execution.Milliseconds
DELETE Request Response TimeAverage response time in milliseconds for DELETE statement execution.Milliseconds
Read Request Response TimeAverage response time in milliseconds for SELECT statement execution.Milliseconds

Resources

Metric name

Description

Units

Used Dynamic MemoryAmount of dynamic memory currently allocated for query execution and other runtime operations.MB
Dynamic Memory UsagePercentage of the dynamic memory pool currently consumed by active operations.Percentage
Used MemoryTotal process memory currently consumed by the GaussDB V5 database process.MB
Thread Pool UsagePercentage of the thread pool capacity currently occupied by active worker threads.Percentage

Xlog

Metric name

Description

Units

Data Volume to Be Flushed to DisksAmount of dirty data in memory that has not yet been written to disk by the checkpoint process.MB
Physical Reads per SecondNumber of physical disk read operations performed by the database engine per second.Count/second
Physical Writes per SecondNumber of physical disk write operations performed by the database engine per second.Count/second
XlogsTotal number of xlog files currently present on the instance.Count
Xlog SizeTotal disk space consumed by xlog files on the instance.MB
CN Temporary Directory SizeDisk space consumed by temporary files in the coordinator node's temporary directory.MB

Status

Metric name

Description

Units

DN StatusHealth and operational status of the Data Node instances in the GaussDB V5 cluster.Status
Replication Slot Directory SizeTotal disk space consumed by all replication slot directories on the primary node.MB
Heap and Index Tuple InconsistenciesNumber of detected inconsistencies between heap table tuples and their corresponding index entries, indicating data integrity issues.Count

Threshold configuration

You can configure thresholds and alerts for all GaussDB openGauss metrics to detect performance degradation proactively or connection issues.

  1. Go to Admin > Configuration Profiles > Threshold and Availability.
  2. Create or edit your Threshold Profile for GaussDB openGauss.
  3. Assign the profile to the respective monitors to trigger alerts.

IT Automation

Use Site24x7's IT Automation to resolve common issues with GaussDB openGauss performance:

  1. Go to Admin >IT Automation Templates. Then, click Add Automation Templates.
  2. Create an automation rule by selecting the automation Type (e.g., Server reboot, clear queue).
  3. Map the created rules to the GaussDB openGauss, for automatic execution during alerts.

Configuration rules

Use Configuration Rules to simplify bulk setup across GaussDB openGauss instances. Automatically assign Threshold Profiles, Notification Profiles, Tags, and Monitor Groups when new monitors are discovered.

Related articles

Was this document helpful?

Would you like to help us improve our documents? Tell us what you think we could do better.


We're sorry to hear that you're not satisfied with the document. We'd love to learn what we could do to improve the experience.


Thanks for taking the time to share your feedback. We'll use your feedback to improve our online help resources.

Shortlink has been copied!