In the previous post I have been talking of IOMETER and we have seen that today it is no more the best tool for disk performance measurement for several reasons.
ENTER DISKSPD
Crawling the net, I have found an excellent blog post by Jose Barreto on a tool named Diskspd.exe recently developed by Microsoft. As explained in that post, Diskspd is a flexible tool that produce many types of synthetic workloads using all kinds of storage, including local disks, LUNs on a SAN, Storage Spaces or SMB file shares.
The version I have found is 2.0.15, released on January 2015, which officially supersedes the SQLIO Disk Subsystem Benchmark Tool. The source code is hosted on the following Github repo:
while the binaries can be found here:
The interesting things about Diskspd are that :
- it comes with a very nice explanation of the way it works, with examples and use cases
- the provided performance information is extensive
- the text output can be parsed by the PowerShell automators we have all become
As a test bed for the initial runs I have decided to use a HP Proliant BL460c Gen8 with 32 CPU cores and a Smart Array P220i controller with a RAID 1 array made up of two 10000 RPM SAS disks.
Actually I used Diskspd against more recent hardware but it failed with the following error:
diskspd test.dat
WARNING: Complete CPU utilization cannot currently be gathered within DISKSPD for this system.
Use alternate mechanisms to gather this data such as perfmon/logman.
Active KGroups 2 > 1 and/or processor count 16 > 64.
When you have a look at the Diskspd help, you will se that the only mandatory parameter is a target, in the form of a file path (c:\test.dat), a partition letter (E:) or a physical drive number (#2):
Not being an expert, I decide to run the command with just the mandatory target in an elevated prompt:
diskspd c:\test.dat
Command Line: C:\Windows\system32\diskspd.exe c:\test.dat
Input parameters:
timespan: 1
-------------
duration: 10s
warm up time: 5s
cool down time: 0s
random seed: 0
path: 'test.dat'
think time: 0ms
burst size: 0
using software and hardware write cache
performing read test
block size: 65536
using sequential I/O (stride: 65536)
number of outstanding I/O operations: 2
thread stride size: 0
threads per file: 1
using I/O Completion Ports
IO priority: normal
Results for timespan 1:
*******************************************************************************
actual test time: 10.09s
thread count: 1
proc count: 32
CPU | Usage | User | Kernel | Idle
-------------------------------------------
0| 100.00%| 3.71%| 96.30%| 0.00%
1| 0.00%| 0.00%| 0.00%| 100.00%
2| 2.47%| 0.62%| 1.85%| 97.38%
3| 0.00%| 0.00%| 0.00%| 100.00%
4| 1.39%| 0.00%| 1.39%| 98.46%
5| 0.00%| 0.00%| 0.00%| 100.00%
6| 0.31%| 0.15%| 0.15%| 99.70%
7| 0.00%| 0.00%| 0.00%| 100.00%
8| 0.15%| 0.00%| 0.15%| 99.85%
9| 0.00%| 0.00%| 0.00%| 100.00%
10| 0.00%| 0.00%| 0.00%| 100.00%
11| 0.00%| 0.00%| 0.00%| 100.00%
12| 0.00%| 0.00%| 0.00%| 100.00%
13| 0.00%| 0.00%| 0.00%| 100.00%
14| 0.31%| 0.00%| 0.31%| 99.70%
15| 0.00%| 0.00%| 0.00%| 100.00%
16| 1.24%| 1.08%| 0.15%| 98.77%
17| 0.00%| 0.00%| 0.00%| 100.00%
18| 6.03%| 5.41%| 0.62%| 93.98%
19| 0.00%| 0.00%| 0.00%| 100.00%
20| 4.79%| 4.64%| 0.15%| 95.06%
21| 0.00%| 0.00%| 0.00%| 100.00%
22| 0.46%| 0.15%| 0.31%| 99.54%
23| 0.00%| 0.00%| 0.00%| 100.00%
24| 0.00%| 0.00%| 0.00%| 100.00%
25| 0.00%| 0.00%| 0.00%| 100.00%
26| 0.00%| 0.00%| 0.00%| 100.00%
27| 0.00%| 0.00%| 0.00%| 100.00%
28| 0.00%| 0.00%| 0.00%| 100.00%
29| 0.00%| 0.00%| 0.00%| 100.00%
30| 0.00%| 0.00%| 0.00%| 100.00%
31| 0.00%| 0.00%| 0.00%| 100.00%
-------------------------------------------
avg.| 3.66%| 0.49%| 3.17%| 96.33%
Total IO
thread | bytes | I/Os | MB/s | I/O per s | file
------------------------------------------------------------------------------
0 | 25740902400 | 392775 | 2432.28 | 38916.47 | test.dat (1024MB)
------------------------------------------------------------------------------
total: 25740902400 | 392775 | 2432.28 | 38916.47
Read IO
thread | bytes | I/Os | MB/s | I/O per s | file
------------------------------------------------------------------------------
0 | 25740902400 | 392775 | 2432.28 | 38916.47 | test.dat (1024MB)
------------------------------------------------------------------------------
total: 25740902400 | 392775 | 2432.28 | 38916.47
Write IO
thread | bytes | I/Os | MB/s | I/O per s | file
------------------------------------------------------------------------------
0 | 0 | 0 | 0.00 | 0.00 | test.dat (1024MB)
------------------------------------------------------------------------------
total: 0 | 0 | 0.00 | 0.00
Looking at the text output, we can see that by default Diskspd warms up for five seconds before doing a ten seconds long sequential read run with 65k blocks and two outstanding I/O operations against a 1GB file:
The result are split in two. In the first section you can find a bit of information on the usage of all the cores:
In the second section there are the actual IO results, split in three sections:
- Total IO
- Read IO
- Write IO
Since, as I have said, the default test is a read test, there are no data about write operations and the Total IO results match exactly the Read IO results:
If I try to balance reads and writes, I can see that my RAID 1, which is in theory supposed to have high read performance, and medium write performance (due to a write IO penalty of 2, as explained here), returns oddly uniform results:
When I disable caching (-h parameter), I get much lower results, but I still cannot see the impact of the RAID 1 write penalty:
When to this I add the –L parameter, the same command returns latency information:
By defaults Diskspd uses a 65K I/O size. Changing this parameter to 4k (-b4k) causes an increment in the number of the IOPS for both reads and writes, latency decreases dramatically, but the throughput keeps being the same for reads and writes:
If I switch from sequential operations to random operations (–r switch), the throughput decreases while read latency increases:
I just came across your blog, nice read. I am comparing your results with mine and I have no idea how you could obtain 2.5GB/s on 2x SAS drives in RAID 1??
ReplyDeleteI am running same test on a Gen8 ProLiant with a P420i card and 1GB cache on FBWC card. I have 4x SAS 10k drives (6G) in RAID10 and the maximum transfer I was able to pull in SEQ reads was 577MB/s. How are you doing that?