Monday, June 27, 2016

Measuring IOPS part 3: of the impact of disk array cache on the results of DISKSPD

In the previous post we have welcomed the new kid on the block, namedly DISKSPD, and found out that, though it should disable hardware caching, the –h switch seems to have no effect on the HP SmartArray RAID controller cache. If we have a look at the SmartArray RAID performance factors document edited by HP, we can read that the Read Cache is useful in prefetching sequential reads:


https://1.bp.blogspot.com/--UEeQKtzQw8/VztXxJ4aS2I/AAAAAAAAC04/MGncqT1UZLkhdG9HKSwVi93f0PlteV-4QCKgB/s640/iops15.png


Concerning the Write Cache, HP states that it is used to cheat on the host application by making it believe that the data have been written when they are not, and that if the Cache fills up, the Disk controller has some mechanisms to speed up execution, such as grouping logical blocks (write coalescing) or change the execution order to lower latency (command reordering):


https://3.bp.blogspot.com/-3Bq7PHjROzs/VztXx-CtSSI/AAAAAAAAC08/6aSkkcf4iXQ3LqSX5F07pmgZl-tNMQPZgCKgB/s640/iops16.png


So, the fact of using the HP SmartArray controller cache justifies the fact that, during my tests with Diskspd, I cannot see the write penalty which is typical for RAID 1 configurations:


https://4.bp.blogspot.com/-0k05deVRQgc/VztXyapEaWI/AAAAAAAAC1A/hFv413_snr8fvb2EoDjG-tYShCAz_cSxACKgB/s640/iops17.png


To confirm the fact that the HP Controller is not honoring the request to disable caching, I can run Diskspd with the the –S switch (instead of -h). The –S switch disable the OS cache only. Since the result of this run and of the previous run are roughly the same, we can state that Diskspd can’t disable the SmartArray cache, only the OS cache:

https://1.bp.blogspot.com/-uITrRmNvXLk/VztXy84APXI/AAAAAAAAC1E/Sw2ZWIr2cnMksZ6zzu8HqzhRjTMQuPC0gCKgB/s640/iops18.png


This particular case apart, DISKSPD seems to be a nice tool by Microsoft. Let's keep studying it in the next post of this series (to be published).

Thursday, June 23, 2016

Introducing a PowerShell function to report cluster resources moves and failovers

The number of clusters I am managing these days is becoming very large, so I felt like I needed some kind of report mailed to me giving a view of what resources went online, offline, became degraded or failed in my environment. The good news when you are doing whatever you can think of on a Windows box (and not only) is that you can leverage, you know the special word, PowerShell, which, once you install the Failover Clustering feature, comes with a module for managing clusters:

Get-Module FailoverClusters | Format-List

Name              : FailoverClusters
Path              : C:\windows\system32\windowspowershell\v1.0\Modules\FailoverClusters\FailoverClusters.psd1
Description       : 
ModuleType        : Manifest
Version           : 2.0.0.0
NestedModules     : {Microsoft.FailoverClusters.PowerShell, Microsoft.FailoverClusters.PowerShell}
ExportedFunctions : 
ExportedCmdlets   : {Add-ClusterCheckpoint, Add-ClusterDisk, Add-ClusterFileServerRole, 
                    Add-ClusterGenericApplicationRole...}
ExportedVariables : 
ExportedAliases   : {Add-VMToCluster, Remove-VMFromCluster}
Now the hardest part when you want to set up a cluster reporting, is to find the proper events on which to build your solutions. After a few tests, I came up with the following events to look for:

Cluster Informational events: these are stored in a special operational log
  • Event ID 1201 under Microsoft-Windows-FailoverClustering/Operational is the cluster group coming online
  • Event ID 1204 under Microsoft-Windows-FailoverClustering/Operational is the cluster group going offline
Cluster Warning events: these are recorded in the System event log
  • Event ID 1167 under System is for resource Groups becoming degraded (message: Cluster Agent: The cluster resource resourcename has become degraded.)
Cluster Error events: these are recorded in the System event log
  • Event ID 1609 under System is for resource failures (message: Cluster resource 'resourcename' in clustered service or application 'appname' failed.)



The function I came up with leverage just a few cmdlets.


The first one is Get-WinEvent, used in conjunction with the -FilterHashtable parameter, which has the advantage of being lighting fast (especially compared to Get-EventLog), as I explained in a previous post where I used it to retrieve the system boot time (http://www.happysysadm.com/2014/07/windows-boot-time-explored-in-powershell.html - chapter 12).



The other cmdlets came all from the FailoverClustering module:

Enough said. Here's the advanced function Get-ClusterResourceInfo:

function Get-ClusterResourceInfo
        <#
        .SYNOPSIS
            Generates a report of the status of cluster resources.
 
        .DESCRIPTION
            Generates a report of the status of cluster resources with the timestamp of the last time each resource came online.
            The report can be sent as an e-mail or exported to a CSV file.
             
        .PARAMETER Cluster
            Name of the cluster to query

        .PARAMETER Passthru
            Returns the report as an object

        .PARAMETER Detailed
            Expands Resource Groups adding details about reources

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru -Detailed

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru -Detailed -Verbose

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru | Format-table -AutoSize

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru | Sort-Object ClusterName | Format-Table -Property ResourceGroup, LastOnline

        .EXAMPLE
            'clu1','clu2' | % { Get-ClusterResourceInfo -Cluster $_ -Passthru -Detailed -Verbose } #| Format-Table * -AutoSize

        .AUTHOR
            HappySysadm.com

        #>
    {
    [CmdletBinding(DefaultParameterSetName='__AllParameterSets')]
    param(

        [Parameter(Mandatory)][ValidateScript({Get-Cluster $_})][string]$Cluster,

        [switch]$Detailed = $False,
        
        [switch]$Passthru = $True

        )
    
    write-verbose 'Started'
    
    if($Detailed) { Write-Verbose 'Detailed mode is ON' } else { Write-Verbose 'Detailed mode is OFF' }

    Write-Verbose 'Retrieving cluster information'

    $Error = $False

    $info = @()

    $infolastonline = @()

    $infolastoffline = @()

    $infolasterror = @()

    $infolastdegraded = @()
    
    try
    
        {

         $clu = Get-cluster $cluster

         $nodes = $clu | Get-ClusterNode

         #Event ID 1201 is the cluster group coming online. Retrieving them all.
         $eventson = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='Microsoft-Windows-FailoverClustering/Operational';ID = 1201}}
         $eventson = $eventson | sort timecreated -Descending

         #Event ID 1204 is the cluster group going offline. Retrieving them all.
         $eventsoff = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='Microsoft-Windows-FailoverClustering/Operational';ID = 1204}}
         $eventsoff = $eventsoff | sort timecreated -Descending
         
         #Event ID 1609 is for resource failures. Retrieving them all.
         $eventserror = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='System';ID = 1069}}
         $eventsoerror = $eventserror | sort timecreated -Descending
         
         #Event ID 1167 is for resource group becoming degraded. Retrieving them all.
         $eventsdegraded = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='System';ID = 1167}}
         $eventsdegraded = $eventsdegraded | sort timecreated -Descending

         $clustergroups = $clu | Get-ClusterGroup
    
         foreach($clustergroup in $clustergroups) {

                Write-Verbose "`tWorking on $($clu.name) - $($clu | Get-ClusterNode) - $($clustergroup.Name)"

                #Selecting only the events 1201 for the current resource group
                $infolastonline += ($eventson | ? {$_.message -match $clustergroup.name}).TimeCreated

                #Selecting only the events 1204 for the current resource group
                $infolastoffline += ($eventsoff | ? {$_.message -match $clustergroup.name}).TimeCreated

                #Selecting all events regarding resource failures for the current resource group
                $infolasterror = ($eventserror | ? {$_.message -match $clustergroup.name}).TimeCreated

                #Selecting all events regarding current resource group becoming degraded
                $infolastdegraded = ($eventsdegraded | ? {$_.message -match $clustergroup.name}).TimeCreated

                if($detailed) {
                
                    #More detailed: retrieving all the resources inside the current resource group
                    $resources = $clustergroup | Get-ClusterResource

                    foreach($resource in $resources) {
                
                        $infores = [PSCustomObject]@{
                            ClusterType = 'MSCS'
                            ClusterName = $clu.Name
                            ResourceGroup = $resource.OwnerGroup
                            Resource = $resource.Name
                            ServerName = $resource.OwnerNode
                            ResourceStatus = $resource.State
                            LastOnline = if($infolastonline){$infolastonline[0]}else{'NA'}
                            LastOffline = if($infolastoffline){$infolastoffline[0]}else{'NA'}
                            LastError = if($infolasterror){$infolasterror[0]}else{'NA'}}

                        $info += $infores

                        }

                    }
                    
                else {

                    #Less detailed: only resource groups report.
                    
                    $inforesgroup = [PSCustomObject]@{
                            ClusterType = 'MSCS'
                            ClusterName = $clu.Name
                            ResourceGroup = $clustergroup.Name
                            ServerName = $clustergroup.OwnerNode
                            ResourceStatus = $clustergroup.State
                            LastOnline = if($infolastonline){$infolastonline[0]}else{'NA'}
                            LastOffline = if($infolastoffline){$infolastoffline[0]}else{'NA'}
                            LastDegraded = if($infolastdegraded){$infolastdegraded[0]}else{'NA'}}

                    $info += $inforesgroup

                    }
                
                $infolastonline = $null

                $infolastoffline = $null
                
                $infolasterror = $null

                $infolastdegraded = $null

                }
            
         }

    catch
        
        { 
            
        $Error = $True
            
        Write-Warning $_.Exception.Message 

        $info = $null
            
        }

    if($Error)

        {

        Write-Warning "Script aborted. No information returned on $clu.name"

        }

    elseif($Passthru)
       
        {
        
        Write-Verbose 'Returning object for further filtering.'
        
        $info
        
        }

    else

        {

        Write-Verbose "No passthru"

        }
    
    write-verbose 'Finished'
    
    }
In my case I pipe the output of this function to Send-MailMessage so that I receive a weekly report of all my clusters. Hope you like it. If you wish to contribute to this function, you can find it on my GitHub.

Friday, June 17, 2016

Measuring IOPS part 2: DISKSPD

In the previous post I have been talking of IOMETER and we have seen that today it is no more the best tool for disk performance measurement for several reasons.
ENTER DISKSPD
Crawling the net, I have found an excellent blog post by Jose Barreto on a tool named Diskspd.exe recently developed by Microsoft. As explained in that post, Diskspd is a flexible tool that produce many types of synthetic workloads using all kinds of storage, including local disks, LUNs on a SAN, Storage Spaces or SMB file shares.
The version I have found is 2.0.15, released on January 2015, which officially supersedes the SQLIO Disk Subsystem Benchmark Tool. The source code is hosted on the following Github repo:
while the binaries can be found here:
The interesting things about Diskspd are that :
  • it comes with a very nice explanation of the way it works, with examples and use cases
  • the provided performance information is extensive
  • the text output can be parsed by the PowerShell automators we have all become
As a test bed for the initial runs I have decided to use a HP Proliant BL460c Gen8 with 32 CPU cores and a Smart Array P220i controller with a RAID 1 array made up of two 10000 RPM SAS disks.
Actually I used Diskspd against more recent hardware but it failed with the following error:
diskspd test.dat

WARNING: Complete CPU utilization cannot currently be gathered within DISKSPD for this system.
        Use alternate mechanisms to gather this data such as perfmon/logman.
        Active KGroups 2 > 1 and/or processor count 16 > 64.
It looks like there’s a problem with the API that gets CPU stats, as mentioned here.

When you have a look at the Diskspd help, you will se that the only mandatory parameter is a target, in the form of a file path (c:\test.dat), a partition letter (E:) or a physical drive number (#2):

https://3.bp.blogspot.com/-KUOsNXc--5Q/VztXt_aMCVI/AAAAAAAAC0U/nyF2-nzI24oRWbGGyLnd6rPeWP8VW4LTgCKgB/s640/iops06.png

Not being an expert, I decide to run the command with just the mandatory target in an elevated prompt:
diskspd c:\test.dat

Command Line: C:\Windows\system32\diskspd.exe c:\test.dat

Input parameters:

       timespan:   1
       -------------
       duration: 10s
       warm up time: 5s
      cool down time: 0s
       random seed: 0
       path: 'test.dat'
          think time: 0ms
          burst size: 0
          using software and hardware write cache
          performing read test
          block size: 65536
          using sequential I/O (stride: 65536)
          number of outstanding I/O operations: 2
          thread stride size: 0
          threads per file: 1
          using I/O Completion Ports
          IO priority: normal

Results for timespan 1:
*******************************************************************************

actual test time:       10.09s
thread count:           1
proc count:             32

CPU |  Usage |  User  |  Kernel |  Idle
-------------------------------------------
  0| 100.00%|   3.71%|   96.30%|   0.00%
  1|   0.00%|   0.00%|    0.00%| 100.00%
  2|   2.47%|   0.62%|    1.85%|  97.38%
  3|   0.00%|   0.00%|    0.00%| 100.00%
  4|   1.39%|   0.00%|    1.39%|  98.46%
  5|   0.00%|   0.00%|    0.00%| 100.00%
  6|   0.31%|   0.15%|    0.15%|  99.70%
  7|   0.00%|   0.00%|    0.00%| 100.00%
  8|   0.15%|   0.00%|    0.15%|  99.85%
  9|   0.00%|   0.00%|    0.00%| 100.00%
 10|   0.00%|   0.00%|    0.00%| 100.00%
 11|   0.00%|   0.00%|    0.00%| 100.00%
 12|   0.00%|   0.00%|    0.00%| 100.00%
 13|   0.00%|   0.00%|    0.00%| 100.00%
 14|   0.31%|   0.00%|    0.31%|  99.70%
 15|   0.00%|   0.00%|    0.00%| 100.00%
 16|   1.24%|   1.08%|    0.15%|  98.77%
 17|   0.00%|   0.00%|    0.00%| 100.00%
 18|   6.03%|   5.41%|    0.62%|  93.98%
 19|   0.00%|   0.00%|    0.00%| 100.00%
 20|   4.79%|   4.64%|    0.15%|  95.06%
 21|   0.00%|   0.00%|    0.00%| 100.00%
 22|   0.46%|   0.15%|    0.31%|  99.54%
 23|   0.00%|   0.00%|    0.00%| 100.00%
 24|   0.00%|   0.00%|    0.00%| 100.00%
 25|   0.00%|   0.00%|    0.00%| 100.00%
 26|   0.00%|   0.00%|    0.00%| 100.00%
 27|   0.00%|   0.00%|    0.00%| 100.00%
 28|   0.00%|   0.00%|    0.00%| 100.00%
 29|   0.00%|   0.00%|    0.00%| 100.00%
 30|   0.00%|   0.00%|    0.00%| 100.00%
 31|   0.00%|   0.00%|    0.00%| 100.00%
-------------------------------------------
avg.|   3.66%|   0.49%|    3.17%|  96.33%

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  file
------------------------------------------------------------------------------
    0 |     25740902400 |       392775 |    2432.28 |   38916.47 | test.dat (1024MB)
------------------------------------------------------------------------------
total:       25740902400 |       392775 |    2432.28 |   38916.47

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  file
------------------------------------------------------------------------------
    0 |     25740902400 |       392775 |    2432.28 |   38916.47 | test.dat (1024MB)
------------------------------------------------------------------------------
total:       25740902400 |       392775 |    2432.28 |   38916.47

Write IO
thread |       bytes    |     I/Os     |     MB/s   |  I/O per s |  file
------------------------------------------------------------------------------
    0 |               0 |            0 |       0.00 |       0.00 | test.dat (1024MB)
------------------------------------------------------------------------------
total:                 0 |            0 |       0.00 |       0.00
Looking at the text output, we can see that by default Diskspd warms up for five seconds before doing a ten seconds long sequential read run with 65k blocks and two outstanding I/O operations against a 1GB file:

https://2.bp.blogspot.com/--IKuJ6o2ce4/VztXvTJK5QI/AAAAAAAAC0o/-xah3zwl_nQ_wko15xA7v_2rHJuP3-kgwCKgB/s400/iops07.png

The result are split in two. In the first section you can find a bit of information on the usage of all the cores:

https://3.bp.blogspot.com/-yKQtCcuj8Pc/VztXuJDiGmI/AAAAAAAAC0Y/cUH-oL8ZRt8xCILMUf7cB4zeISDga5YbwCKgB/s640/iops08.png

In the second section there are the actual IO results, split in three sections:
  • Total IO
  • Read IO
  • Write IO
Since, as I have said, the default test is a read test, there are no data about write operations and the Total IO results match exactly the Read IO results:

https://1.bp.blogspot.com/-WBf8o-nwzi0/VztXuD1B9EI/AAAAAAAAC0c/XF7EljYd9y4kxXCP0iv8Ybk2HyEAbsM3ACKgB/s640/iops09.png

If I try to balance reads and writes, I can see that my RAID 1, which is in theory supposed to have high read performance, and medium write performance (due to a write IO penalty of 2, as explained here), returns oddly uniform results:

https://2.bp.blogspot.com/-5tQKrvxWiCY/VztXuijR4yI/AAAAAAAAC0g/SWF6Ujh9dyQNUaumGEtka6QJJuRlCu2TQCKgB/s640/iops10.png

When I disable caching (-h parameter), I get much lower results, but I still cannot see the impact of the RAID 1 write penalty:

https://3.bp.blogspot.com/--LUmwLSCh1g/VztXuo6NyMI/AAAAAAAAC0k/k755gz0WsRQ38juCeIj19UzTJn9FM7adQCKgB/s640/iops11.png

When to this I add the –L parameter, the same command returns latency information:

https://2.bp.blogspot.com/--Aixu-cCPpc/VztXvXDbXSI/AAAAAAAAC0s/KuoxWj2fOgcDKkBMWGippF46Go7dSi7VgCKgB/s640/iops12.png

By defaults Diskspd uses a 65K I/O size. Changing this parameter to 4k (-b4k) causes an increment in the number of the IOPS for both reads and writes, latency decreases dramatically, but the throughput keeps being the same for reads and writes:

https://2.bp.blogspot.com/-xQ2Ajvo8WS4/VztXv8DulxI/AAAAAAAAC0w/4bH49fvdCkQ5n1MUe9GeFNf6BUn96fZiQCKgB/s640/iops13.png

If I switch from sequential operations to random operations (–r switch), the throughput decreases while read latency increases:

https://3.bp.blogspot.com/-bX8xyPtDTnU/VztXwYRxe2I/AAAAAAAAC00/XTugFSG1fLQhZGa3Ri9Q143jcrDNEH_NACKgB/s640/iops14.png

Again, the results are strangely uniform between reads and writes. We will see why in the third post of this series.
Related Posts Plugin for WordPress, Blogger...