Monitoring
Disk Activity
Microsoft® SQL
Server™ uses Microsoft Windows NT® I/O calls to perform disk reads and
writes. SQL Server manages when and how disk I/O is performed, but relies on
Windows NT to perform the underlying I/O operations. The I/O subsystem
includes the system bus, disk controller cards, disks, tape drives, CD-ROM
drive, and many other I/O devices. The disks are frequently the biggest
bottleneck in a system.
Monitoring
Disk I/O and Detecting Excess Paging
Two of the
counters that can be monitored to determine disk activity include:
- PhysicalDisk: % Disk Time
- PhysicalDisk: Avg. Disk Queue Length
The PhysicalDisk:
% Disk Time counter in Windows NT Performance
Monitor monitors the percentage of time that the disk is busy with
read/write activity. If the PhysicalDisk: % Disk Time counter is high (more than 90 percent), check the
Physical Disk: Current Disk
Queue Length counter to see how many system requests are waiting for
disk access. The number of waiting I/O requests should be sustained at no
more than 1.5 to 2 times the number of spindles making up the physical disk.
Most disks have one spindle, although redundant array of independent disks
(RAID) devices usually have more. A hardware RAID device appears as one
physical disk in Windows NT Performance Monitor; RAID devices created
through software appear as multiple instances.
Use the values
of the Current Disk Queue Length and % Disk Time counters to
detect bottlenecks within the disk subsystem. If Current Disk Queue
Length and % Disk Time counter values are consistently high,
consider using a faster disk drive, moving some files to an additional disk
or server, or adding additional disks to a RAID array if one is being used.
If you are
using a RAID device, the % Disk Time counter can indicate a value
greater than 100 percent. If it does, use the PhysicalDisk: Avg. Disk
Queue Length counter to determine how many system requests on average
are waiting for disk access.
Applications
and systems that are I/O-bound may keep the disk constantly active. This is
called disk thrashing.
Monitor the
Memory: Page Faults/sec counter to make sure that the disk activity is
not caused by paging. In Windows NT, paging is caused by:
- Processes
configured to use too much memory.
- File system
activity.
If you have
more than one logical partition on the same hard disk, you should use the Logical Disk counters instead of the
Physical Disk counters.
Looking at the logical disk counters will help you determine which files are
heavily accessed. Once you have found the disks with high levels of
read/write activity, look at the read-specific and write-specific counters
(for example, Logical Disk: Disk Write Bytes/sec) for the type of
disk activity that is causing the load on each logical volume.
The
PhysicalDisk: Avg. Disk Queue Length counter monitors the number of
outstanding requests on the disk. Numbers that are consistently higher on
one disk than other disks might indicate the need to redistribute data. This
counter’s value is affected by the max async IO configuration option.
If the counter value is consistently less than the configuration option
value, it is an indication that the disk subsystem has enough capacity to
handle the batched I/O being generated by SQL Server. In this situation,
increasing the max async IO configuration value can improve
performance. Conversely, if the counter is consistently greater than the
configuration option value, then the disk subsystem is not keeping up with
the I/O requests. If the overload is being caused by SQL Server I/O, then
reducing the configuration value may be beneficial. |