Wednesday, July 6, 2016

New PowerShell function to setup a iSCSI target on a NETAPP and mount it on Windows

I don't often use my NETAPP controllers as iSCSI targets. Most of the time I just export NFS (especially to host VMWare datastores) or CIFS and that suffices my needs. Sometimes however I am asked to mount on Windows Servers disks from the NETAPP directly on startup and not just as simple shares mounted with logon scripts or whatever.
So I decided to write a pretty do-it-all PowerShell function that does all the work of configuring the NETAPP and mounting the disk via iSCSI on my server.
If I had to do this via the OnCommand GUI, I should first setup an aggregate, then a volume, then a LUN, configure the iGroup and only then move to my Windows server and manually bind the initiator with the target.
Fortunately back in 2010 NETAPP released a module (inside the NetApp PowerShell Toolkit) to do all this stuff and more. Today you need to grab at least version 1.5 to get all the cmdlets I used in my function. Personally I have the last version (which is 4.2) which can bo downloaded here.

Concerning the requirements, the script must run on the Windows Server that acts as initiator. On that server you have to be running at least version 3.0 of Windows PowerShell and to have the DataONTAP module installed.
The Windows Server that acts as initiator will run my function that
  • configures the storage controller
  • setup the target
  • setup the initiator
  • do the mapping with the LUN in a iGroup.
Once the function has finished configuring the iSCSI disk, you just have to initialize it in Disk Management, format it and assign a letter. I already showed you how to do this in PowerShell, just search my blog.

Enough said: here's the function Mount-NAiSCSIDisk.

Just one last quick note: I am going to work on this function to improve it, so please, post suggestions or ideas in the comments below! And, for sure, share!

#Requires -Version 3.0 -Modules DataONTAP

function Mount-NAiSCSIDisk
            Setup and mounts a NetApp iSCSI LUN to a Windows Server
            Setup an aggregate, a volume, a LUN, an iGroup then mounts it as a NetApp iSCSI target to a Windows Server
        .PARAMETER NaController
            Name of the NetApp controller

        .PARAMETER Aggregate
            Name of the aggregate

        .PARAMETER DiskCount
            Number of disks in the aggregate
        .PARAMETER Volume
            Name of the volume

        .PARAMETER VolumeSize
            Size of the volume

        .PARAMETER SnapshotReserveSize
            Size of the snapshot reserve

        .PARAMETER LUNPath
            Path of the LUN

        .PARAMETER LUNType
            Type of the LUN

        .PARAMETER LUNSize
            Size of the LUN

        .PARAMETER Igroup
            Nq;e of the Igroup

        .PARAMETER IgroupProtocol
            Protocl for the Igroup
        .PARAMETER IgroupType
            Type of the Igroup

            Mount-NAiSCSIDisk -NaController netapp1 -Igroup igroup1 -IgroupProtocol iscsi -IgroupType windows
            Mount-NAiSCSIDisk -NaController netapp1 -Aggregate aggr1 -DiskCount 32 -Volume vol1 -VolumeSize 10g -SnapshotReserveSize 0 -LUNPath /vol/NA1/iscsivol1 -LUNSize 1g -LUNType windows -Igroup igroup1 -IgroupProtocol iscsi -IgroupType windows

   - Carlo MANCINI

    write-verbose 'Started'
    $Error = $False

    $info = @()

    $infolastonline = @()

    Write-Verbose 'Connect to NaController'

    try {
        Connect-NaController $NaController -ErrorAction Stop


    catch {

        Write-Warning 'Unable to connect to the controller'


    write-verbose 'Starting the configuration on the netapp controlller'
    if(Get-NaAggr $Aggregate) {

        Write-Verbose 'Creating the aggregate'
        New-NaAggr $Aggregate -Use64Bit -DiskCount $DiskCount


    else {

        Write-Verbose 'Aggregate already existing'


    if(Get-NaVol $Volume) {

        Write-Verbose 'Creating the volume'

        New-NaVol $Volume -Aggregate $Aggregate $VolumeSize

        Write-Verbose 'Setting the snapshot reserve'

        Set-NaSnapshotReserve $Volume $SnapshotReserveSize


    else {

        Write-Verbose 'Volume already existing'


    if(Get-NaLun $LUNPath) {
        Write-Verbose 'Creating the LUN'
        New-NaLun -Path $LUNPath -Size $LUNSize -Type $LunType


    else {

        Write-Verbose 'LUN already existing'

    Write-Verbose 'Starting the iSCSI configuration'
    Write-Verbose 'Adding the storage controller to the iSCSI Initiator target portals'


    Write-Verbose 'Establishing a connection to the target discovered by the iSCSI Initiator'

    Get-NaHostIscsiTarget (Get-NaIscsiNodeName) | Connect-NaHostIscsiTarget

    Write-Verbose 'Creating a new initiator group'

    New-NaIgroup $Igroup $IgroupProtocol $IgroupType

    Write-Verbose 'Adding the initiator the the initiator group'
    Get-NaHostIscsiAdapter | Add-NaIgroupInitiator $Igroup

    Write-Verbose 'Mapping the LUN to the initiators in the initiator group'

    Add-NaLunMap $LUNPath $Igroup


Monday, July 4, 2016

Measuring IOPS part 4: binding IOMETER results to DISKSPD results

In the previous post we have seen that Diskspd can provide us with a quick information on MB/s and I/O per second (IOPS), as well as the average latency.
For the moment we have no idea if the 39k IOPS measured in the first sequential read test or the 710 total IOPS measured in the last random read and write test are good or bad result compared to the workload I have generated. The only thing that we have found for sure is that latency decreases when we use small IOs instead of large IOs. But we still don’t know if a latency of 5 ms, such the one measured in the last run, is a fancy value or not.
We need to investigate further the other Diskspd options. But before we do that we have to build a dictionary of what is measured with IOMETER and see the corrisponding terms used by DISKSPD.

If I look in the Diskspd help, and I try to reproduce the variables of IOmeter, I can build the following table of matches:

  • IOMETER Transfer request size = DISKSPD -b parameter, described as ‘Size of the IO in KB
  • IOMETER Percent Read/Write Distribution = DISKSPD –w, described as ‘Percentage of writes
  • IOMETER Percent Random/Sequential Distribution = DISKSPD –r, which forces Random operations
  • IOMETER # of Outstanding I/Os = DISKSPD –o, described as ‘Outstanding IOs or queue depth (per thread)
At this moment we have no idea of what kind of workload we have to generate to get meaningful results that can help me assess storage performance, and Jose Barreto himself states that we have to experiment with the –t and the –o parameter until we find the combination that gives the best results.

Now guess which tool I am going to use to automate this task in the next post of this series: PowerShell! Stay tuned!

Monday, June 27, 2016

Measuring IOPS part 3: of the impact of disk array cache on the results of DISKSPD

In the previous post we have welcomed the new kid on the block, namedly DISKSPD, and found out that, though it should disable hardware caching, the –h switch seems to have no effect on the HP SmartArray RAID controller cache. If we have a look at the SmartArray RAID performance factors document edited by HP, we can read that the Read Cache is useful in prefetching sequential reads:

Concerning the Write Cache, HP states that it is used to cheat on the host application by making it believe that the data have been written when they are not, and that if the Cache fills up, the Disk controller has some mechanisms to speed up execution, such as grouping logical blocks (write coalescing) or change the execution order to lower latency (command reordering):

So, the fact of using the HP SmartArray controller cache justifies the fact that, during my tests with Diskspd, I cannot see the write penalty which is typical for RAID 1 configurations:

To confirm the fact that the HP Controller is not honoring the request to disable caching, I can run Diskspd with the the –S switch (instead of -h). The –S switch disable the OS cache only. Since the result of this run and of the previous run are roughly the same, we can state that Diskspd can’t disable the SmartArray cache, only the OS cache:

This particular case apart, DISKSPD seems to be a nice tool by Microsoft. Let's keep studying it in the next post of this series (to be published).

Thursday, June 23, 2016

Introducing a PowerShell function to report cluster resources moves and failovers

The number of clusters I am managing these days is becoming very large, so I felt like I needed some kind of report mailed to me giving a view of what resources went online, offline, became degraded or failed in my environment. The good news when you are doing whatever you can think of on a Windows box (and not only) is that you can leverage, you know the special word, PowerShell, which, once you install the Failover Clustering feature, comes with a module for managing clusters:

Get-Module FailoverClusters | Format-List

Name              : FailoverClusters
Path              : C:\windows\system32\windowspowershell\v1.0\Modules\FailoverClusters\FailoverClusters.psd1
Description       : 
ModuleType        : Manifest
Version           :
NestedModules     : {Microsoft.FailoverClusters.PowerShell, Microsoft.FailoverClusters.PowerShell}
ExportedFunctions : 
ExportedCmdlets   : {Add-ClusterCheckpoint, Add-ClusterDisk, Add-ClusterFileServerRole, 
ExportedVariables : 
ExportedAliases   : {Add-VMToCluster, Remove-VMFromCluster}
Now the hardest part when you want to set up a cluster reporting, is to find the proper events on which to build your solutions. After a few tests, I came up with the following events to look for:

Cluster Informational events: these are stored in a special operational log
  • Event ID 1201 under Microsoft-Windows-FailoverClustering/Operational is the cluster group coming online
  • Event ID 1204 under Microsoft-Windows-FailoverClustering/Operational is the cluster group going offline
Cluster Warning events: these are recorded in the System event log
  • Event ID 1167 under System is for resource Groups becoming degraded (message: Cluster Agent: The cluster resource resourcename has become degraded.)
Cluster Error events: these are recorded in the System event log
  • Event ID 1609 under System is for resource failures (message: Cluster resource 'resourcename' in clustered service or application 'appname' failed.)

The function I came up with leverage just a few cmdlets.

The first one is Get-WinEvent, used in conjunction with the -FilterHashtable parameter, which has the advantage of being lighting fast (especially compared to Get-EventLog), as I explained in a previous post where I used it to retrieve the system boot time ( - chapter 12).

The other cmdlets came all from the FailoverClustering module:

Enough said. Here's the advanced function Get-ClusterResourceInfo:

function Get-ClusterResourceInfo
            Generates a report of the status of cluster resources.
            Generates a report of the status of cluster resources with the timestamp of the last time each resource came online.
            The report can be sent as an e-mail or exported to a CSV file.
        .PARAMETER Cluster
            Name of the cluster to query

        .PARAMETER Passthru
            Returns the report as an object

        .PARAMETER Detailed
            Expands Resource Groups adding details about reources

            Get-ClusterResourceInfo -Cluster clustername -Passthru

            Get-ClusterResourceInfo -Cluster clustername -Passthru -Detailed

            Get-ClusterResourceInfo -Cluster clustername -Passthru -Detailed -Verbose

            Get-ClusterResourceInfo -Cluster clustername -Passthru | Format-table -AutoSize

            Get-ClusterResourceInfo -Cluster clustername -Passthru | Sort-Object ClusterName | Format-Table -Property ResourceGroup, LastOnline

            'clu1','clu2' | % { Get-ClusterResourceInfo -Cluster $_ -Passthru -Detailed -Verbose } #| Format-Table * -AutoSize



        [Parameter(Mandatory)][ValidateScript({Get-Cluster $_})][string]$Cluster,

        [switch]$Detailed = $False,
        [switch]$Passthru = $True

    write-verbose 'Started'
    if($Detailed) { Write-Verbose 'Detailed mode is ON' } else { Write-Verbose 'Detailed mode is OFF' }

    Write-Verbose 'Retrieving cluster information'

    $Error = $False

    $info = @()

    $infolastonline = @()

    $infolastoffline = @()

    $infolasterror = @()

    $infolastdegraded = @()

         $clu = Get-cluster $cluster

         $nodes = $clu | Get-ClusterNode

         #Event ID 1201 is the cluster group coming online. Retrieving them all.
         $eventson = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='Microsoft-Windows-FailoverClustering/Operational';ID = 1201} -ErrorAction SilentlyContinue}
         $eventson = $eventson | sort timecreated -Descending

         #Event ID 1204 is the cluster group going offline. Retrieving them all.
         $eventsoff = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='Microsoft-Windows-FailoverClustering/Operational';ID = 1204} -ErrorAction SilentlyContinue}
         $eventsoff = $eventsoff | sort timecreated -Descending
         #Event ID 1609 is for resource failures. Retrieving them all.
         $eventserror = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='System';ID = 1069} -ErrorAction SilentlyContinue}
         $eventsoerror = $eventserror | sort timecreated -Descending
         #Event ID 1167 is for resource group becoming degraded. Retrieving them all.
         $eventsdegraded = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='System';ID = 1167} -ErrorAction SilentlyContinue}
         $eventsdegraded = $eventsdegraded | sort timecreated -Descending

         $clustergroups = $clu | Get-ClusterGroup
         foreach($clustergroup in $clustergroups) {

                Write-Verbose "`tWorking on $($ - $($clu | Get-ClusterNode) - $($clustergroup.Name)"

                #Selecting only the events 1201 for the current resource group
                $infolastonline += ($eventson | ? {$_.message -match $}).TimeCreated

                #Selecting only the events 1204 for the current resource group
                $infolastoffline += ($eventsoff | ? {$_.message -match $}).TimeCreated

                #Selecting all events regarding resource failures for the current resource group
                $infolasterror = ($eventserror | ? {$_.message -match $}).TimeCreated

                #Selecting all events regarding current resource group becoming degraded
                $infolastdegraded = ($eventsdegraded | ? {$_.message -match $}).TimeCreated

                if($detailed) {
                    #More detailed: retrieving all the resources inside the current resource group
                    $resources = $clustergroup | Get-ClusterResource

                    foreach($resource in $resources) {
                        $infores = [PSCustomObject]@{
                            ClusterType = 'MSCS'
                            ClusterName = $clu.Name
                            ResourceGroup = $resource.OwnerGroup
                            Resource = $resource.Name
                            ServerName = $resource.OwnerNode
                            ResourceStatus = $resource.State
                            LastOnline = if($infolastonline){$infolastonline[0]}else{'NA'}
                            LastOffline = if($infolastoffline){$infolastoffline[0]}else{'NA'}
                            LastError = if($infolasterror){$infolasterror[0]}else{'NA'}}

                        $info += $infores


                else {

                    #Less detailed: only resource groups report.
                    $inforesgroup = [PSCustomObject]@{
                            ClusterType = 'MSCS'
                            ClusterName = $clu.Name
                            ResourceGroup = $clustergroup.Name
                            ServerName = $clustergroup.OwnerNode
                            ResourceStatus = $clustergroup.State
                            LastOnline = if($infolastonline){$infolastonline[0]}else{'NA'}
                            LastOffline = if($infolastoffline){$infolastoffline[0]}else{'NA'}
                            LastDegraded = if($infolastdegraded){$infolastdegraded[0]}else{'NA'}}

                    $info += $inforesgroup

                $infolastonline = $null

                $infolastoffline = $null
                $infolasterror = $null

                $infolastdegraded = $null


        $Error = $True
        Write-Warning $_.Exception.Message 

        $info = $null



        Write-Warning "Script aborted. No information returned on $"


        Write-Verbose 'Returning object for further filtering.'



        Write-Verbose "No passthru"

    write-verbose 'Finished'
In my case I pipe the output of this function to Send-MailMessage so that I receive a weekly report of all my clusters. Hope you like it. If you wish to contribute to this function, you can find it on my GitHub.

Friday, June 17, 2016

Measuring IOPS part 2: DISKSPD

In the previous post I have been talking of IOMETER and we have seen that today it is no more the best tool for disk performance measurement for several reasons.
Crawling the net, I have found an excellent blog post by Jose Barreto on a tool named Diskspd.exe recently developed by Microsoft. As explained in that post, Diskspd is a flexible tool that produce many types of synthetic workloads using all kinds of storage, including local disks, LUNs on a SAN, Storage Spaces or SMB file shares.
The version I have found is 2.0.15, released on January 2015, which officially supersedes the SQLIO Disk Subsystem Benchmark Tool. The source code is hosted on the following Github repo:
while the binaries can be found here:
The interesting things about Diskspd are that :
  • it comes with a very nice explanation of the way it works, with examples and use cases
  • the provided performance information is extensive
  • the text output can be parsed by the PowerShell automators we have all become
As a test bed for the initial runs I have decided to use a HP Proliant BL460c Gen8 with 32 CPU cores and a Smart Array P220i controller with a RAID 1 array made up of two 10000 RPM SAS disks.
Actually I used Diskspd against more recent hardware but it failed with the following error:
diskspd test.dat

WARNING: Complete CPU utilization cannot currently be gathered within DISKSPD for this system.
        Use alternate mechanisms to gather this data such as perfmon/logman.
        Active KGroups 2 > 1 and/or processor count 16 > 64.
It looks like there’s a problem with the API that gets CPU stats, as mentioned here.

When you have a look at the Diskspd help, you will se that the only mandatory parameter is a target, in the form of a file path (c:\test.dat), a partition letter (E:) or a physical drive number (#2):

Not being an expert, I decide to run the command with just the mandatory target in an elevated prompt:
diskspd c:\test.dat

Command Line: C:\Windows\system32\diskspd.exe c:\test.dat

Input parameters:

       timespan:   1
       duration: 10s
       warm up time: 5s
      cool down time: 0s
       random seed: 0
       path: 'test.dat'
          think time: 0ms
          burst size: 0
          using software and hardware write cache
          performing read test
          block size: 65536
          using sequential I/O (stride: 65536)
          number of outstanding I/O operations: 2
          thread stride size: 0
          threads per file: 1
          using I/O Completion Ports
          IO priority: normal

Results for timespan 1:

actual test time:       10.09s
thread count:           1
proc count:             32

CPU |  Usage |  User  |  Kernel |  Idle
  0| 100.00%|   3.71%|   96.30%|   0.00%
  1|   0.00%|   0.00%|    0.00%| 100.00%
  2|   2.47%|   0.62%|    1.85%|  97.38%
  3|   0.00%|   0.00%|    0.00%| 100.00%
  4|   1.39%|   0.00%|    1.39%|  98.46%
  5|   0.00%|   0.00%|    0.00%| 100.00%
  6|   0.31%|   0.15%|    0.15%|  99.70%
  7|   0.00%|   0.00%|    0.00%| 100.00%
  8|   0.15%|   0.00%|    0.15%|  99.85%
  9|   0.00%|   0.00%|    0.00%| 100.00%
 10|   0.00%|   0.00%|    0.00%| 100.00%
 11|   0.00%|   0.00%|    0.00%| 100.00%
 12|   0.00%|   0.00%|    0.00%| 100.00%
 13|   0.00%|   0.00%|    0.00%| 100.00%
 14|   0.31%|   0.00%|    0.31%|  99.70%
 15|   0.00%|   0.00%|    0.00%| 100.00%
 16|   1.24%|   1.08%|    0.15%|  98.77%
 17|   0.00%|   0.00%|    0.00%| 100.00%
 18|   6.03%|   5.41%|    0.62%|  93.98%
 19|   0.00%|   0.00%|    0.00%| 100.00%
 20|   4.79%|   4.64%|    0.15%|  95.06%
 21|   0.00%|   0.00%|    0.00%| 100.00%
 22|   0.46%|   0.15%|    0.31%|  99.54%
 23|   0.00%|   0.00%|    0.00%| 100.00%
 24|   0.00%|   0.00%|    0.00%| 100.00%
 25|   0.00%|   0.00%|    0.00%| 100.00%
 26|   0.00%|   0.00%|    0.00%| 100.00%
 27|   0.00%|   0.00%|    0.00%| 100.00%
 28|   0.00%|   0.00%|    0.00%| 100.00%
 29|   0.00%|   0.00%|    0.00%| 100.00%
 30|   0.00%|   0.00%|    0.00%| 100.00%
 31|   0.00%|   0.00%|    0.00%| 100.00%
avg.|   3.66%|   0.49%|    3.17%|  96.33%

Total IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  file
    0 |     25740902400 |       392775 |    2432.28 |   38916.47 | test.dat (1024MB)
total:       25740902400 |       392775 |    2432.28 |   38916.47

Read IO
thread |       bytes     |     I/Os     |     MB/s   |  I/O per s |  file
    0 |     25740902400 |       392775 |    2432.28 |   38916.47 | test.dat (1024MB)
total:       25740902400 |       392775 |    2432.28 |   38916.47

Write IO
thread |       bytes    |     I/Os     |     MB/s   |  I/O per s |  file
    0 |               0 |            0 |       0.00 |       0.00 | test.dat (1024MB)
total:                 0 |            0 |       0.00 |       0.00
Looking at the text output, we can see that by default Diskspd warms up for five seconds before doing a ten seconds long sequential read run with 65k blocks and two outstanding I/O operations against a 1GB file:

The result are split in two. In the first section you can find a bit of information on the usage of all the cores:

In the second section there are the actual IO results, split in three sections:
  • Total IO
  • Read IO
  • Write IO
Since, as I have said, the default test is a read test, there are no data about write operations and the Total IO results match exactly the Read IO results:

If I try to balance reads and writes, I can see that my RAID 1, which is in theory supposed to have high read performance, and medium write performance (due to a write IO penalty of 2, as explained here), returns oddly uniform results:

When I disable caching (-h parameter), I get much lower results, but I still cannot see the impact of the RAID 1 write penalty:

When to this I add the –L parameter, the same command returns latency information:

By defaults Diskspd uses a 65K I/O size. Changing this parameter to 4k (-b4k) causes an increment in the number of the IOPS for both reads and writes, latency decreases dramatically, but the throughput keeps being the same for reads and writes:

If I switch from sequential operations to random operations (–r switch), the throughput decreases while read latency increases:

Again, the results are strangely uniform between reads and writes. We will see why in the third post of this series.
Related Posts Plugin for WordPress, Blogger...