Amazon FSx Monitoring Integration
Amazon FSx is a fully managed service that integrates file systems with cloud-native AWS services, helping you run your file systems. It currently offers the following file systems: Amazon FSx for Windows File Server (for business applications) and Amazon FSx for Lustre (for compute-intensive workloads).
With Site24x7's integration with Amazon FSx, you can obtain the operational metrics of the FSx file systems hosted in your AWS infrastructure. You can also track the file system operations, like data repository tasks and backups.
Setup and configuration
1. If you haven't already, connect your AWS account with Site24x7's AWS account by either:
- Creating Site24x7 as an IAM user.
- Creating a cross-account IAM role. Learn more
2. On the Integrate AWS Account page, check the appropriate box for Amazon FSx. Learn more
Policy and permissions
Site24x7 uses various Amazon FSx APIs to collect information about your migration service. Assign the AWS managed policy ReadOnlyAccess to the Site24x7 entity (IAM user or IAM role) to help Site24x7 collect metrics and metadata. If you want to assign a custom policy, please make sure the following read-level actions are present in the policy JSON. Learn more
- "fsx:ListTagsForResource",
- "fsx:DescribeBackups",
- "fsx:DescribeDataRepositoryTasks",
- "fsx:DescribeFileSystems"
Polling Frequency
Site24x7 queries AWS to collect Amazon FSx performance metrics according to the configured polling frequency. The polling interval is one hour by default. Learn more
IT Automations
You can add automations for the AWS services supported by Site24x7. Log in to Site24x7 and go to Admin > IT Automation Templates (+) > Add Automation Templates. Once automations are added, you can schedule them to be executed one after the other.
You can now create a data repository task or a backup for the file system using Amazon FSx automations.
Performance metrics for file systems
Metric name | Description | Supported for file system type | Statistic | Unit |
---|---|---|---|---|
Data Read Bytes | Number of bytes for file system read operations. | All | Sum | MB |
Data Write Bytes | Number of bytes for file system write operations. | All | Sum | MB |
Data Write Operations | Number of write operations. | All | Sum | Count |
Data Read Operations | Number of read operations. | All | Sum | Count |
Metadata Operations | Number of metadata operations. | All | Sum | Count |
Free Storage Capacity | Amount or percentage of available storage capacity. | All | Average | GB/Percentage |
Total Throughput | Total throughput of the file system. | All | Average | MB/sec |
Read Throughput | Read throughput of the file system. | All | Average | MB/sec |
Write Throughput | Read throughput of the file system. | All | Average | MB/sec |
Total IOPS | Total number of I/O operations per second. | All | Average | Count/sec |
Read IOPS | Total number of read I/O operations per second. | All | Average | Count/sec |
Write IOPS | Total number of write I/O operations per second. | All | Average | Count/sec |
Metadata IOPS | Total number of metadata I/O operations per second. | All | Average | Count/sec |
Client Connections | The number of active connections between clients and the file server. | Windows and OpenZFS | Sum | Count |
Network Throughput Utilization | The percent utilization of network throughput for the file system. | All file system types except Lustre | Average | Percentage |
CPU Utilization | The percentage utilization of your file server’s CPU resources. | All file system types except Lustre | Average | Percentage |
Memory Utilization | The percentage utilization of your file server’s memory resources. | Windows and OpenZFS | Average | Percentage |
File Server Disk Throughput Utilization | The disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by throughput capacity. | All file system types except Lustre | Average | Percentage |
File Server Disk Throughput Balance | The percentage of available burst credits for disk throughput between your file server and its storage volumes. Valid for file systems provisioned with a throughput capacity of 256 Mbps or less. | All file system types except Lustre | Average | Percentage |
File Server DiskIops Utilization | The disk IOPS between your file server and storage volumes, as a percentage of the provisioned limit determined by throughput capacity. | All file system types except Lustre | Average | Percentage |
File Server DiskIops Balance | The percentage of available burst credits for disk IOPS between your file server and its storage volumes. Valid for file systems provisioned with a throughput capacity of 256 Mbps or less. | All file system types except Lustre | Average | Percentage |
Disk Read Bytes | The number of bytes for read operations that access storage volumes. | All file system types except Lustre | Sum | Bytes |
Disk Write Bytes | The number of bytes for write operations that access storage volumes. | All file system types except Lustre | Sum | Bytes |
Disk Read Operations | The number of read operations for the file server accessing storage volumes. | All file system types except Lustre | Sum | Count |
Disk Write Operations | The number of write operations for the file server accessing storage volumes. | All file system types except Lustre | Sum | Count |
Disk Throughput Utilization | (HDD only) The disk throughput between your file server and its storage volumes, as a percentage of the provisioned limit determined by the storage volumes. | Windows | Average | Percentage |
Disk Throughput Balance | (HDD only) The percentage of available burst credits for disk throughput and disk IOPS for the storage volumes. | Windows and OpenZFS | Average | Percentage |
Disk IOPS Utilization | (SSD only) The disk IOPS between your file server and storage volumes, as a percentage of the provisioned IOPS limit determined by the storage volumes. | All file system types except Lustre | Average | Percentage |
Deduplication Saved Storage | The amount of storage space saved by data deduplication, if it is enabled. | Windows | Sum | Bytes |
Logical Disk Usage | The amount of logical data stored (uncompressed). | Lustre | Sum | Bytes |
Physical Disk Usage | The amount of storage physically occupied by file system data (compressed). | Lustre | Sum | Bytes |
File Create Operations | The total number of file create operations. | Lustre | Sum | Count |
File Open Operations | The total number of file open operations. | Lustre | Sum | Count |
File Delete Operations | The total number of file delete operations. | Lustre | Sum | Count |
Stat Operations | The total number of stat operations. | Lustre | Sum | Count |
Rename Operations | The total number of directory renames, whether in-place directory renames or cross directory renames. | Lustre | Sum | Count |
Directory Delete Operations | The total number of directory delete operations. | Lustre | Sum | Count |
Directory Create Operations | The total number of directory create operations. | Lustre | Sum | Count |
NFS Bad Calls | The number of calls rejected by the NFS server remote procedure call (RPC) mechanism. | OpenZFS | Sum | Count |
File Server Cache Hit Ratio | For OpenZFS: The percentage of cache hits. For Single-AZ 2 (non-HA and HA) file systems, this metric reports the cache hit ratio for both the in-memory (ARC) and NVMe (L2ARC) caches. For Single-AZ 1 (non-HA and HA) file systems, this metric reports only the cache hit ratio for the ARC cache. For ONTAP: The percentage of all read requests that are served by data in the file system's RAM and NVMe caches. A higher percentage means that more reads are served by the file system's read caches. | OpenZFS and ONTAP | Average | Percentage |
Compression Ratio | The ratio of compressed storage usage to uncompressed storage usage. | OpenZFS | Average | Ratio |
Storage Efficiency Savings | The bytes saved from storage efficiency features (compression, deduplication, and compaction). | ONTAP | Sum | Bytes |
Logical Data Stored | The total amount of logical data stored on the file system, considering both the SSD tier and the capacity pool tier. This metric includes the total logical size of snapshots and FlexClones but does not include storage efficiency savings achieved through compression, compaction, and deduplication. | ONTAP | Sum | Bytes |
Network Sent Bytes | The number of bytes (network I/O) sent by the file system. | ONTAP | Sum | Bytes |
Network Received Bytes | The number of bytes (network I/O) received by the file system. | ONTAP | Sum | Bytes |
Data Read Operation Time | The sum of total time spent within the file system for read operations (network I/O) from clients accessing data in the file system. | ONTAP | Sum | Bytes |
Data Write Operation Time | The sum of total time spent within the file system for fulfilling write operations (network I/O) from clients accessing data in the file system. | ONTAP | Sum | Bytes |
Capacity Pool Read Bytes | The number of bytes read (network I/O) from the file system's capacity pool tier. | ONTAP | Sum | Bytes |
Capacity Pool Write Bytes | The number of bytes written (network I/O) to the file system's capacity pool tier. | ONTAP | Sum | Bytes |
Capacity Pool Read Operations | The number of read operations (network I/O) from the file system's capacity pool tier. This translates to a capacity pool read request. | ONTAP | Sum | Count |
Capacity Pool Write Operations | The number of write operations (network I/O) to the file system from the capacity pool tier. This translates to a write request. | ONTAP | Sum | Count |
Storage Capacity Utilization | The percent utilization of storage capacity for the file system. | All | Average | Percentage |
Storage Used | The total storage capacity used for the file system in GB. | All | Sum | Bytes |
Read Operations | The average data read operation time per data read operation. | ONTAP | Average | Seconds |
Write Operations | The average data write operation time per data write operation. | ONTAP | Average | Seconds |
Metadata Operations | The average time taken per meta data operation. | ONTAP | Average | Seconds |
Capacity Pool Tier | The used physical storage capacity in bytes, specific to the storage tier. This value includes savings from storage-efficiency features, such as data compression and deduplication. With StorageTier as StandardCapacityPool | ONTAP | Average | Bytes |
Primary Tier Capacity | The storage capacity for all data types with storage tier as SSD. | ONTAP | Average | Bytes |
Primary Tier Used | The used physical storage capacity in bytes, specific to the storage tier. This value includes savings from storage-efficiency features, such as data compression and deduplication. With StorageTier as SSD, this metric measures the logical space usage for this volume for your SSD. | ONTAP | Average | Bytes |
Primary Tier Avail | The available or unused physical storage capacity in bytes, specific to the storage tier. | ONTAP | Average | Bytes |
Metadata Operation Time | The total time taken in meta data operation. | ONTAP | Sum | Seconds |
Available Volumes | The number of available volumes. | OpenZFS and ONTAP | Sum | Count |
Failed Volumes | The number of failed volumes. | OpenZFS and ONTAP | Sum | Count |
Misconfigured Volumes | The number of misconfigured volumes. | OpenZFS and ONTAP | Sum | Count |
Created Volumes | The number of created volumes. | OpenZFS and ONTAP | Sum | Count |
Available SVM | The number of available SVM (Support Vector Machine). | ONTAP | Sum | Count |
Failed SVM | The number of failed SVM | ONTAP | Sum | Count |
Misconfigured SVM | The number of misconfigured SVM. | ONTAP | Sum | Count |
Total Volumes | The total number of volumes in the file system. | OpenZFS and ONTAP | Sum | Count |
Total SVM | The total number of storage virtual machines in the file system. | ONTAP | Sum | Count |
No Data Compression OpenZFS Volume | The method used to compress the data on the volume can be NONE | ZSTD | LZ4. This metric shows the number of volumes that use no compression method. | OpenZFS | Sum | Count |
Zstandard (ZSTD) Compression OpenZFS Volume | The number of volumes that use the Zstandard (ZSTD) compression algorithm to compress the data on the volume. | OpenZFS | Sum | Count |
LZ4 Compression OpenZFS Volume | The number of volumes that use the LZ4 compression algorithm to compress the data on the volume. | OpenZFS | Sum | Count |
Clone Volume | The number of volumes that reference the data in the origin snapshot, i.e. that uses the clone strategy when copying data from the snapshot to the new volume. | OpenZFS | Sum | Count |
Full Copy Volume | The number of volumes which copies all data from the snapshot to the new volume i.e. that uses full-copy strategy when copying data from the snapshot to the new volume. | OpenZFS | Sum | Count |
Incremental Copy OpenZFS Volume | The number of volumes that use an incremental copy strategy when copying data from the snapshot to the new volume. This option is only for updating an existing volume by using a snapshot from another FSx for the OpenZFS file system. | OpenZFS | Sum | Count |
Performance metrics for data repository tasks
Attribute | Description | Statistic | Data type |
---|---|---|---|
Succeeded Count | Number of files successfully exported. | Sum | Count |
Failed Count | Number of files that failed to export. | Sum | Count |
Total Count | Total number of files to export. | Sum | Count |
Forecast
Estimate future values of the following performance metrics and make informed decisions about adding capacity or scaling your AWS infrastructure.
- Data Read Bytes
- Data Write Bytes
- Data Write Operations
- Data Read Operations
- Metadata Operations
Site24x7's Amazon FSx monitoring interface
Summary
Gain an overview of the different events occurring within each FSx file system with time series charts. This section provides you with operational information on data read operations, data write operations, metadata operations, throughput, read or write bytes, IOPS usage, and more.
Data Repository Tasks
All the metadata related to repository tasks is listed here. This includes information like the task ID, status of the task, life cycle state, failure reason (if any), and time stamps of task creation, start time, and end time. The Action column lets you set up alerts or add an automation in case the data repository task is down.
Backup Details
The backup details carried out for any FSx file system will be listed here. This includes information about the backup, like the time, type, ID, state of the backup life cycle, KMS key ARN, and Active Directory ID. If you want to delete the monitoring setup for a particular backup, just click the delete option next to each backup task.
Outages
The Outages tab shows the history of your file systems’ various states, like down, trouble, critical, or maintenance. It also provides details on the start and end time of an outage, its duration, and comments (if any). You can also manually add an outage and edit or delete the comments in this same section.
Log Report
Here you can view the audit log data for an FSx file system, along with details on the timestamp, status, data read bytes, data write bytes, and data read/write operations.