How to troubleshoot high I/O wait time in Linux

Related

I/O wait time is a metric used to measure the amount of time the CPU waits for disk I/O operations to complete. A high I/O wait time indicates an idle CPU and outstanding I/O requests—while it might not make a system unhealthy, it will limit the performance of the CPU.

The CPU’s I/O wait signifies that while no processes were in a runnable state, at least one I/O operation was in progress. In simple terms, I/O wait is the time spent by the CPU waiting for I/O completion.

What causes high I/O wait time?

I/O wait simply indicates the state of the CPU or CPU cores. High I/O wait means the CPU is outstanding on requests, but a further investigation is needed to confirm the source and effect.

Here are a few possible causes of high I/O wait time:

  • Bottlenecks in the storage layer that cause the drive to take more time to respond to I/O requests
  • A queue of I/O requests in the storage layer that lead to an increase in latency
  • Block devices (such as physical disks) that are too slow or simply at saturation point
  • Processes that are in an uninterruptible sleep state
  • Processes that perform heavy read and write operations to the disk
  • Swapping of a partition or file that are performed due to RAM shortage on the host or guest operating system
  • Disk and network I/O operations, that are the most common cause of system slowness
  • Slow disk or degraded RAID array that delays accessing the memory for read and write operations

Is I/O wait affecting your system’s performance?

High I/O wait forces the server to handle higher bandwidth to perform other computations while waiting on IO requests. I/O wait is a common metric to analyze system performance. When high I/O wait occurs frequently, it can negatively impact system performance by slowing down the system or causing issues such as low transaction throughput in the database and health degradation in applications and databases.

Diagnosing I/O wait time in Linux

The two commands most commonly used to identify and troubleshoot I/O wait time in Linux are top and vmstat. These commands will display various CPU-level statistics, including I/O wait time.

Using the top command

The top command is the easiest and most widely used command for identifying I/O wait time in Linux. It displays a dynamic and real-time view of the system, CPU-level statistics, and a running list of processes managed by the kernel.

Syntax:

top [options]

Example:

Running the top command will display an output similar to the one shown below:

Fig. 1: Output of the top command Fig. 1: Output of the top command

We’ll focus on the %Cpu(s)line. It displays, among other important statistics, the I/O wait time. The I/O wait time is denoted by the wa label in the %Cpu(s) line.

wa denotes the percentage of time the CPU was waiting for I/O to complete. If you are on a multi-core processor system, you can press 1 while the top command is running to get a breakdown of CPU usage per CPU.

Using the vmstat command

The vmstat command is another performance monitoring tool provided by Linux. It delivers information about memory, processes, CPU, and more. By default, vmstat displays the statistics since the system’s last reboot. It can also show real-time statistics that update after a specified interval.

Syntax:

vmstat [options][delay [count]]

The two parameters commonly used with the vmstat command are:

  • delay: This parameter is used to continuously update the reports after the specified delay in seconds.
  • count: This parameter is used to define the number of updates needed. The default value is infinite.

Example:

Running the vmstat command will give you an output similar to the one shown below:

Fig. 2: Output of the vmstat command Fig. 2: Output of the vmstat command

The output is divided into four sections; we will focus on the CPU section. The columns in this section are similar to the %Cpu(s) line of the top command. Along with the I/O wait time (wa), we also get some additional information through the vmstat command, such as the total free memory or the number of blocks swapped to and from I/O. Having less free memory, for example, can lead to a high I/O wait time.

Identifying the processes causing high I/O wait time

Once you’ve confirmed that you’re dealing with a high I/O wait time, the next step should be finding processes responsible for these I/O operations. One of the easiest and most frequently used commands for this is iotop.

Using the iotop command

The iotop command is identical to the top command in the sense that it is used to monitor the disk I/O usage along with the running list of processes or threads responsible for it. It is based on Python and requires kernel access to monitor the processes doing I/O.

Syntax:

iotop [options]

Example:

Running the iotop command will give you an output similar to the one shown below:

Fig. 3: Output of the iotop command Fig. 3: Output of the iotop command

The first line contains the total read and write bandwidth of the disk. The next line displays the actual bandwidth of read and write operations done by the processes or threads currently doing I/O. The next section displays the running list of processes with data on their current disk I/O usage. The most important column here is the I/O column, which displays the percentage of time the process was doing I/O with respect to its total processing time. This particular list of processes is sorted in reverse order in this column.

One of the most commonly used options with this command is -o or -only. Run together with the command (iotop -o), it narrows the results down to a running list of processes that are actually doing I/O.

Identifying the disks being written by I/O

The top and vmstat commands tell us about the I/O wait time, and the iotop command helps us identify the processes doing I/O operations. One other important factor to consider is what disks are being used in these operations. The iostat command provides this information.

Using the iostat command

The iostat command displays a system’s input/output statistics. It generates device-level reports by monitoring the active time period in relation to average transfer rates. And like the top and vmstat commands, iostat also displays the CPU statistics.

Syntax:

iostat [option] [interval] [count]

The interval and count parameters here are similar to those of the vmstat command, and they’re used to show real-time updates infinitely or for specific counts only.

Example:

Running the iostat command will give you an output similar to the one shown below:

Fig. 4: Output of the iostat command Fig. 4: Output of the iostat command

The first line above displays the average CPU statistics, including the I/O wait time. It represents the same value as that of wa in the top and vmstat commands. The next sections provide the following data for all devices and partitions:

  • Device: The partition or device name
  • tps: Transfer per second. An overloaded or busy processor will always have a higher tps.
  • Blk_read/s (kB_read/s): The number of blocks of data read from the device per second
  • Blk_wrtn/s (kB_wrtn/s): The number of blocks of data written to the device per second
  • Blk_read (kB_read): The total number of blocks read from this device since the last reboot
  • Blk_wrtn (kB_wrtn): The total number of blocks written to this device since the last reboot

Fixing high I/O wait time in Linux

After we’ve identified the processes behind the high I/O wait time and the disks involved, there are multiple fixes we can apply. These are some of the most common methods to minimize high I/O wait time:

  • Reduce the frequency of disk reads and writes by reducing I/O operations such as database queries.
  • Ensure that the operating system is updated to the latest version with all patches installed so that any I/O wait caused by bugs in the OS can be reduced.
  • Upgrade the hard disk and RAM in the server and install disks with higher RPMs or SSD with high I/OPS.
  • If CPU user time is too high, try to terminate the process or processes doing the damage.
  • Check for swap usage—high swap usage means the system is actually out of RAM. In this case, increasing the RAM may reduce I/O wait time.
  • Check for any potential memory leak—some applications can consume memory without freeing it up after use. This can lead to a memory leak and eventually to a high I/O wait time.

Conclusion

When troubleshooting high I/O wait in Linux, it’s important to understand the root cause as the first step. The top and vmstat commands can then help diagnose high I/O wait time, while iotop and iostat can help troubleshoot by identifying the processes and disks causing the high I/O wait time. We’ve learned that there are multiple ways to handle high I/O wait time, depending on its cause. Focus on minimizing I/O wait–the lesser the I/O wait time, the better your system will perform.

Was this article helpful?

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 "Learn" portal. Get paid for your writing.

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.

Apply Now
Write For Us