Thursday, June 23, 2016

Introducing a PowerShell function to report cluster resources moves and failovers

The number of clusters I am managing these days is becoming very large, so I felt like I needed some kind of report mailed to me giving a view of what resources went online, offline, became degraded or failed in my environment. The good news when you are doing whatever you can think of on a Windows box (and not only) is that you can leverage, you know the special word, PowerShell, which, once you install the Failover Clustering feature, comes with a module for managing clusters:

Get-Module FailoverClusters | Format-List

Name              : FailoverClusters
Path              : C:\windows\system32\windowspowershell\v1.0\Modules\FailoverClusters\FailoverClusters.psd1
Description       : 
ModuleType        : Manifest
Version           : 2.0.0.0
NestedModules     : {Microsoft.FailoverClusters.PowerShell, Microsoft.FailoverClusters.PowerShell}
ExportedFunctions : 
ExportedCmdlets   : {Add-ClusterCheckpoint, Add-ClusterDisk, Add-ClusterFileServerRole, 
                    Add-ClusterGenericApplicationRole...}
ExportedVariables : 
ExportedAliases   : {Add-VMToCluster, Remove-VMFromCluster}
Now the hardest part when you want to set up a cluster reporting, is to find the proper events on which to build your solutions. After a few tests, I came up with the following events to look for:

Cluster Informational events: these are stored in a special operational log
  • Event ID 1201 under Microsoft-Windows-FailoverClustering/Operational is the cluster group coming online
  • Event ID 1204 under Microsoft-Windows-FailoverClustering/Operational is the cluster group going offline
Cluster Warning events: these are recorded in the System event log
  • Event ID 1167 under System is for resource Groups becoming degraded (message: Cluster Agent: The cluster resource resourcename has become degraded.)
Cluster Error events: these are recorded in the System event log
  • Event ID 1609 under System is for resource failures (message: Cluster resource 'resourcename' in clustered service or application 'appname' failed.)



The function I came up with leverage just a few cmdlets.


The first one is Get-WinEvent, used in conjunction with the -FilterHashtable parameter, which has the advantage of being lighting fast (especially compared to Get-EventLog), as I explained in a previous post where I used it to retrieve the system boot time (http://www.happysysadm.com/2014/07/windows-boot-time-explored-in-powershell.html - chapter 12).



The other cmdlets came all from the FailoverClustering module:

Enough said. Here's the advanced function Get-ClusterResourceInfo:

function Get-ClusterResourceInfo
        <#
        .SYNOPSIS
            Generates a report of the status of cluster resources.
 
        .DESCRIPTION
            Generates a report of the status of cluster resources with the timestamp of the last time each resource came online.
            The report can be sent as an e-mail or exported to a CSV file.
             
        .PARAMETER Cluster
            Name of the cluster to query

        .PARAMETER Passthru
            Returns the report as an object

        .PARAMETER Detailed
            Expands Resource Groups adding details about reources

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru -Detailed

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru -Detailed -Verbose

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru | Format-table -AutoSize

        .EXAMPLE
            Get-ClusterResourceInfo -Cluster clustername -Passthru | Sort-Object ClusterName | Format-Table -Property ResourceGroup, LastOnline

        .EXAMPLE
            'clu1','clu2' | % { Get-ClusterResourceInfo -Cluster $_ -Passthru -Detailed -Verbose } #| Format-Table * -AutoSize

        .AUTHOR
            HappySysadm.com

        #>
    {
    [CmdletBinding(DefaultParameterSetName='__AllParameterSets')]
    param(

        [Parameter(Mandatory)][ValidateScript({Get-Cluster $_})][string]$Cluster,

        [switch]$Detailed = $False,
        
        [switch]$Passthru = $True

        )
    
    write-verbose 'Started'
    
    if($Detailed) { Write-Verbose 'Detailed mode is ON' } else { Write-Verbose 'Detailed mode is OFF' }

    Write-Verbose 'Retrieving cluster information'

    $Error = $False

    $info = @()

    $infolastonline = @()

    $infolastoffline = @()

    $infolasterror = @()

    $infolastdegraded = @()
    
    try
    
        {

         $clu = Get-cluster $cluster

         $nodes = $clu | Get-ClusterNode

         #Event ID 1201 is the cluster group coming online. Retrieving them all.
         $eventson = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='Microsoft-Windows-FailoverClustering/Operational';ID = 1201} -ErrorAction SilentlyContinue}
         $eventson = $eventson | sort timecreated -Descending

         #Event ID 1204 is the cluster group going offline. Retrieving them all.
         $eventsoff = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='Microsoft-Windows-FailoverClustering/Operational';ID = 1204} -ErrorAction SilentlyContinue}
         $eventsoff = $eventsoff | sort timecreated -Descending
         
         #Event ID 1609 is for resource failures. Retrieving them all.
         $eventserror = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='System';ID = 1069} -ErrorAction SilentlyContinue}
         $eventsoerror = $eventserror | sort timecreated -Descending
         
         #Event ID 1167 is for resource group becoming degraded. Retrieving them all.
         $eventsdegraded = foreach($node in $nodes){Get-WinEvent -ComputerName $node -FilterHashtable @{LogName='System';ID = 1167} -ErrorAction SilentlyContinue}
         $eventsdegraded = $eventsdegraded | sort timecreated -Descending

         $clustergroups = $clu | Get-ClusterGroup
    
         foreach($clustergroup in $clustergroups) {

                Write-Verbose "`tWorking on $($clu.name) - $($clu | Get-ClusterNode) - $($clustergroup.Name)"

                #Selecting only the events 1201 for the current resource group
                $infolastonline += ($eventson | ? {$_.message -match $clustergroup.name}).TimeCreated

                #Selecting only the events 1204 for the current resource group
                $infolastoffline += ($eventsoff | ? {$_.message -match $clustergroup.name}).TimeCreated

                #Selecting all events regarding resource failures for the current resource group
                $infolasterror = ($eventserror | ? {$_.message -match $clustergroup.name}).TimeCreated

                #Selecting all events regarding current resource group becoming degraded
                $infolastdegraded = ($eventsdegraded | ? {$_.message -match $clustergroup.name}).TimeCreated

                if($detailed) {
                
                    #More detailed: retrieving all the resources inside the current resource group
                    $resources = $clustergroup | Get-ClusterResource

                    foreach($resource in $resources) {
                
                        $infores = [PSCustomObject]@{
                            ClusterType = 'MSCS'
                            ClusterName = $clu.Name
                            ResourceGroup = $resource.OwnerGroup
                            Resource = $resource.Name
                            ServerName = $resource.OwnerNode
                            ResourceStatus = $resource.State
                            LastOnline = if($infolastonline){$infolastonline[0]}else{'NA'}
                            LastOffline = if($infolastoffline){$infolastoffline[0]}else{'NA'}
                            LastError = if($infolasterror){$infolasterror[0]}else{'NA'}}

                        $info += $infores

                        }

                    }
                    
                else {

                    #Less detailed: only resource groups report.
                    
                    $inforesgroup = [PSCustomObject]@{
                            ClusterType = 'MSCS'
                            ClusterName = $clu.Name
                            ResourceGroup = $clustergroup.Name
                            ServerName = $clustergroup.OwnerNode
                            ResourceStatus = $clustergroup.State
                            LastOnline = if($infolastonline){$infolastonline[0]}else{'NA'}
                            LastOffline = if($infolastoffline){$infolastoffline[0]}else{'NA'}
                            LastDegraded = if($infolastdegraded){$infolastdegraded[0]}else{'NA'}}

                    $info += $inforesgroup

                    }
                
                $infolastonline = $null

                $infolastoffline = $null
                
                $infolasterror = $null

                $infolastdegraded = $null

                }
            
         }

    catch
        
        { 
            
        $Error = $True
            
        Write-Warning $_.Exception.Message 

        $info = $null
            
        }

    if($Error)

        {

        Write-Warning "Script aborted. No information returned on $clu.name"

        }

    elseif($Passthru)
       
        {
        
        Write-Verbose 'Returning object for further filtering.'
        
        $info
        
        }

    else

        {

        Write-Verbose "No passthru"

        }
    
    write-verbose 'Finished'
    
    }
In my case I pipe the output of this function to Send-MailMessage so that I receive a weekly report of all my clusters. Hope you like it. If you wish to contribute to this function, you can find it on my GitHub.

2 comments:

Related Posts Plugin for WordPress, Blogger...