In the previous post I have been talking of IOMETER and we have seen that today it is no more the best tool for disk performance measurement for several reasons.
ENTER DISKSPD
Crawling the net, I have found an excellent blog post by Jose Barreto on a tool named Diskspd.exe recently developed by Microsoft. As explained in that post, Diskspd is a flexible tool that produce many types of synthetic workloads using all kinds of storage, including local disks, LUNs on a SAN, Storage Spaces or SMB file shares.
The version I have found is 2.0.15, released on January 2015, which officially supersedes the SQLIO Disk Subsystem Benchmark Tool. The source code is hosted on the following Github repo:
while the binaries can be found here:
The interesting things about Diskspd are that :
it comes with a very nice explanation of the way it works, with examples and use cases
the provided performance information is extensive
the text output can be parsed by the PowerShell automators we have all become
As a test bed for the initial runs I have decided to use a HP Proliant BL460c Gen8 with 32 CPU cores and a Smart Array P220i controller with a RAID 1 array made up of two 10000 RPM SAS disks.
Actually I used Diskspd against more recent hardware but it failed with the following error:
diskspd test.dat
WARNING: Complete CPU utilization cannot currently be gathered within DISKSPD for this system.
Use alternate mechanisms to gather this data such as perfmon/logman.
Active KGroups 2 > 1 and/or processor count 16 > 64.
When you have a look at the Diskspd help, you will se that the only mandatory parameter is a target, in the form of a file path (c:\test.dat), a partition letter (E:) or a physical drive number (#2):
Not being an expert, I decide to run the command with just the mandatory target in an elevated prompt:
diskspd c:\test.dat
Command Line: C:\Windows\system32\diskspd.exe c:\test.dat
Input parameters:
timespan: 1
-------------
duration: 10s
warm up time: 5s
cool down time: 0s
random seed: 0
path: 'test.dat'
think time: 0ms
burst size: 0
using software and hardware write cache
performing read test
block size: 65536
using sequential I/O (stride: 65536)
number of outstanding I/O operations: 2
thread stride size: 0
threads per file: 1
using I/O Completion Ports
IO priority: normal
Results for timespan 1:
*******************************************************************************
actual test time: 10.09s
thread count: 1
proc count: 32
CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 100.00%| 3.71%| 96.30%| 0.00%
1| 0.00%| 0.00%| 0.00%| 100.00%
2| 2.47%| 0.62%| 1.85%| 97.38%
3| 0.00%| 0.00%| 0.00%| 100.00%
4| 1.39%| 0.00%| 1.39%| 98.46%
5| 0.00%| 0.00%| 0.00%| 100.00%
6| 0.31%| 0.15%| 0.15%| 99.70%
7| 0.00%| 0.00%| 0.00%| 100.00%
8| 0.15%| 0.00%| 0.15%| 99.85%
9| 0.00%| 0.00%| 0.00%| 100.00%
10| 0.00%| 0.00%| 0.00%| 100.00%
11| 0.00%| 0.00%| 0.00%| 100.00%
12| 0.00%| 0.00%| 0.00%| 100.00%
13| 0.00%| 0.00%| 0.00%| 100.00%
14| 0.31%| 0.00%| 0.31%| 99.70%
15| 0.00%| 0.00%| 0.00%| 100.00%
16| 1.24%| 1.08%| 0.15%| 98.77%
17| 0.00%| 0.00%| 0.00%| 100.00%
18| 6.03%| 5.41%| 0.62%| 93.98%
19| 0.00%| 0.00%| 0.00%| 100.00%
20| 4.79%| 4.64%| 0.15%| 95.06%
21| 0.00%| 0.00%| 0.00%| 100.00%
22| 0.46%| 0.15%| 0.31%| 99.54%
23| 0.00%| 0.00%| 0.00%| 100.00%
24| 0.00%| 0.00%| 0.00%| 100.00%
25| 0.00%| 0.00%| 0.00%| 100.00%
26| 0.00%| 0.00%| 0.00%| 100.00%
27| 0.00%| 0.00%| 0.00%| 100.00%
28| 0.00%| 0.00%| 0.00%| 100.00%
29| 0.00%| 0.00%| 0.00%| 100.00%
30| 0.00%| 0.00%| 0.00%| 100.00%
31| 0.00%| 0.00%| 0.00%| 100.00%
-------------------------------------------
avg.| 3.66%| 0.49%| 3.17%| 96.33%
Total IO
thread | bytes | I/Os | MB/s | I/O per s | file
------------------------------------------------------------------------------
0 | 25740902400 | 392775 | 2432.28 | 38916.47 | test.dat (1024MB)
------------------------------------------------------------------------------
total: 25740902400 | 392775 | 2432.28 | 38916.47
Read IO
thread | bytes | I/Os | MB/s | I/O per s | file
------------------------------------------------------------------------------
0 | 25740902400 | 392775 | 2432.28 | 38916.47 | test.dat (1024MB)
------------------------------------------------------------------------------
total: 25740902400 | 392775 | 2432.28 | 38916.47
Write IO
thread | bytes | I/Os | MB/s | I/O per s | file
------------------------------------------------------------------------------
0 | 0 | 0 | 0.00 | 0.00 | test.dat (1024MB)
------------------------------------------------------------------------------
total: 0 | 0 | 0.00 | 0.00
Looking at the text output, we can see that by default Diskspd warms up for five seconds before doing a ten seconds long sequential read run with 65k blocks and two outstanding I/O operations against a 1GB file:
The result are split in two. In the first section you can find a bit of information on the usage of all the cores:
In the second section there are the actual IO results, split in three sections:
Total IO
Read IO
Write IO
Since, as I have said, the default test is a read test, there are no data about write operations and the Total IO results match exactly the Read IO results:
If I try to balance reads and writes, I can see that my RAID 1, which is in theory supposed to have high read performance, and medium write performance (due to a write IO penalty of 2, as explained here), returns oddly uniform results:
When I disable caching (-h parameter), I get much lower results, but I still cannot see the impact of the RAID 1 write penalty:
When to this I add the –L parameter, the same command returns latency information:
By defaults Diskspd uses a 65K I/O size. Changing this parameter to 4k (-b4k) causes an increment in the number of the IOPS for both reads and writes, latency decreases dramatically, but the throughput keeps being the same for reads and writes:
If I switch from sequential operations to random operations (–r switch), the throughput decreases while read latency increases:
Again, the results are strangely uniform between reads and writes. We will see why in the third post of this series.