Gathering all the Logs and Running the Availability Group Failover Detection Utility with PowerShell

30/11/2018 – Function has been updated to deal with named instances.

Last week the Tiger Team released their Availability Group Failover Detection Utility which will provide root cause analysis on Cluster Logs, SQL Error Logs, and the Availability groups extended events logs. There is a blog post here and the tool can be downloaded from the Tiger Team GitHub Repository

A Bit of Faffing*

It states on the readme for the Tiger Team GitHub Repository.

Repository for Tiger team for “as-is” solutions and tools/scripts that the team publishes.

The important words are “as-is” sometimes these tools need a bit of faffing some looking after!

There is a pre-requisite and sometimes a little “fixing” that you need to do to get it to run correctly.

First, install the “Microsoft Visual C++ Redistributable for Visual Studio 2017” from here. On the download page, scroll down to the “Other Tools and Frameworks” section to download the redistributable (x64 version).

cdistributable.PNG

Then when you run FailoverDetection.exe you may get strong name validation errors like.

strong name.png

Unhandled Exception: System.IO.FileLoadException: Could not load file or assembly ‘Microsoft.Sq1Server.XEvent.Linq, Version=15.0.0.0, Culture=neutral, PublicKeyToken=89845dcd808cc91’ or one of it s dependencies. Strong name validation failed. (Exception from HRESULT; 0x8013141A) – – – >.Security.SecurityException: Strong name validation failed. (Exception from HRESULT: 0x8e13141A)
—End of inner exception stack trace  —
at FailoverDetector. XeventParser.LoadXevent(String xelFi1eName, String serverName)

Then you will need to run the sn.exe tool which is in the zip file. Use this syntax.

.\sn.exe -Vr PATHTODLLFile

stroingname fix.png

I had to do it for two DLLs.

NOTE – If you get an error like this when running sn.exe (or any executable) from PowerShell it means that you have missed the .\ (dot whack) in front of the executable name.

striong name fail.png

* Faffing – Doing something that is a bit awkward See Link .

Logs required for the Tool

To run the Failover Detection Utility you need to gather the following information from each replica and place it in the specified data folder.

  • SQL error logs
  • Always On Availability Groups Extended Event Logs
  • System Health Extended Event Logs
  • System log
  • Windows cluster log

Once you have gathered all of that data then you need to alter the configuration file for the executable.

{
    "Data Source Path": "Path to Data File",
    "Health Level": 3,
    "Instances": [
        "Replica1",
        "Replica2",
        "Replica3"
    ]
}

Running The Tool

Once you have done that you can then run the Failover Detection Utility. You can double click the exe,

run the exe.PNG

or you can run it from the command line.

run the exe with powershell.PNG

In both cases it won’t exit so when you see the Saving Results to JSON file, you can press enter (sometimes twice!).

The results can be seen in the JSON file which will be stored in a Results directory in the directory that the the FailoverDetection.exe exists.

results.PNG

You can also use some switches with the FailoverDetection utility.

–Analyze – When “–Analyze” is specified as a parameter, the utility will load configuration file without copying log data. It assumes the log files have already been copied over. It does everything as default mode except copying log data. This option is useful if you already have the data in the local tool execution subdirectories and want to rerun the analysis.

-Show -The utility after analyzing log data will display the results in the command console. Additionally, the results will be persisted to a JSON file in the results folder.

They look like this

results - show.PNG

Again, you need to press enter for the details to come through. The results are still saved to the Results folder as json as well so you won’t lose them.

When You Are Doing Something More Than Once ….

Automate it 🙂

When I saw the data that needed to be gathered for this tool, I quickly turned to PowerShell to enable me to easily gather the information. That has turned into a function which will

  • Download and extract the zip file from the Tiger Team GitHub repository
  • Identify all of the replicas for an Availability Group and dynamically create the configuration JSON file
  • Gather all of the required log files and place them in a specified data folder
  • Run the FailoverDetection.exe with any of the switches
  • Includes -Verbose, -Confirm, -Whatif switches so that you can easily see what is happening, be prompted to confirm before actions or see what would happen if you ran the function
  • You still need to press enter at the end though 🙁
  • and you will still need to install the “Microsoft Visual C++ Redistributable for Visual Studio 2017” and runt he strong names tool if needed

This function requires PowerShell version 5, the failovercluster module and and the dbatools module.

You can get the function from my GitHub Functions Repository here (at the moment – will be adding to dbatools see below)

Load the function by either running the code or if you have it saved as a file dot-sourcing it.

. .\Invoke-SqlFailOverDetection.ps1

There are two .’s with a space in between and then a \ without a space. so Dot Space Dot Whack path to file.

The next thing you should do is what you should always do with a new PowerShell function, look at the help.

Get-Help Invoke-SqlFailOverDetection -Detailed

You will find plenty of examples to get you going and explanations of all of the parameters.

Let’s see it in action.

First lets run with a -WhatIf switch which will show us what will happen without performing any state changing actions.

$InstallationFolder = 'C:\temp\failoverdetection\new\Install'
$DownloadFolder = 'C:\temp\failoverdetection\new\Download'
$DataFolder = 'C:\temp\failoverdetection\new\Data'
$SQLInstance = 'SQL0'

$invokeSqlFailOverDetectionSplat = @{
DownloadFolder = $DownloadFolder
SQLInstance = $SQLInstance
DataFolder = $DataFolder
InstallationFolder = $InstallationFolder
}
Invoke-SqlFailOverDetection @invokeSqlFailOverDetectionSplat -WhatIf

whatif.PNG

So you can see that if we run it without the -WhatIf switch it will

  • Create some directories
  • Download the zip file from the repo
  • Extract the zip file
  • Copy the required logs from each of the replicas to the data folder
  • Create the JSON configuration file
  • Run the executable

NOTE : – I have limited the gathering of the system event log to the last 2 days to limit the amount of time spent dealing with a large system log. I gather all of the SQL Error logs in the Error log path as that works for the first scenario I wrote this for, your mileage may vary.

So if we want to run the command we can remove the -WhatIf switch.

$InstallationFolder = 'C:\temp\failoverdetection\new\Install'
$DownloadFolder = 'C:\temp\failoverdetection\new\Download'
$DataFolder = 'C:\temp\failoverdetection\new\Data'
$SQLInstance = 'SQL0'

$invokeSqlFailOverDetectionSplat = @{
DownloadFolder = $DownloadFolder
SQLInstance = $SQLInstance
DataFolder = $DataFolder
InstallationFolder = $InstallationFolder
}
Invoke-SqlFailOverDetection @invokeSqlFailOverDetectionSplat

It can take a little while to run depending on the number of replicas, size of logs etc but once it has started running you can do other things.

It will require being run as an account with permissions to all of the folders specified and Windows and SQL permissions on all of the replicas in the Availability Group.

run1.PNG

As you can see below it has gathered all of the results and placed them in the data folder.

datagathered.PNG

The results can be found in the results folder.

resultsjson.PNG

If I have already run the tool, I can use the Analyze switch to save gathering the data again. I also use the AlreadyDownloaded switch as I do not need to download the zip file again.

$invokeSqlFailOverDetectionSplat = @{
DownloadFolder = $DownloadFolder
SQLInstance = $SQLInstance
DataFolder = $DataFolder
InstallationFolder = $InstallationFolder
AlreadyDownloaded = $true
Analyze = $true
}
Invoke-SqlFailOverDetection @invokeSqlFailOverDetectionSplat
analyze.PNG
and the results are again saved in the results folder.
I can show the results on the screen as well as saving them as JSON with the Show parameter.
$InstallationFolder = 'C:\temp\failoverdetection\Install'
$DownloadFolder = 'C:\temp\failoverdetection\Download'
$DataFolder = 'C:\temp\failoverdetection\Data'
$SQLInstance = 'SQL0'

$invokeSqlFailOverDetectionSplat = @{
DownloadFolder = $DownloadFolder
SQLInstance = $SQLInstance
DataFolder = $DataFolder
InstallationFolder = $InstallationFolder
AlreadyDownloaded = $true
Analyze = $true
Show = $true
}
Invoke-SqlFailOverDetection @invokeSqlFailOverDetectionSplat

show.PNG

You will then need to press enter to get the next lot of results.

more show results.PNG

Why Not Add This To dbatools?

I haven’t added this to dbatools (yet) because I wrote it in this way for a particular need and dbatools requires support for PowerShell V3 . I have, however created an issue added to this issue in the dbatools GitHub Repository (as this is how you to start the process of adding things to dbatools) so hopefully we can get it in there soon as well – in which case I will come back and update this post.

 

Happy Automating!

17 thoughts on “Gathering all the Logs and Running the Availability Group Failover Detection Utility with PowerShell

  1. It looks like I found a minor bug in powershell part
    If you have a named instance, the path to copy is incorrect

    Copying the cluster log to
    C:\temp\failoverdetection\new\Data\Servername\instancename from Servername\instancename
    Get-ChildItem : Cannot find path ‘\\Servername\instancename\D$\Program Files\Microsoft SQL Server\MSSQL13.Instancename\MSSQL\Log’
    because it does not exist.

  2. and the same for Get-Clusterlog

    $null = Get-ClusterLog -Node $replica -Destination $Errorlogpath
    Get-ClusterLog : Failed to generate the cluster log on node servername\instancename.
    Retrieving the COM class factory for remote component with CLSID {88E7AC6D-C561-4F03-9A60-39DD768F867D} from machine
    servername\instancename failed due to the following error: 800706ba servername\instancename.
    At C:\DBA_Data\Invoke-SqlFailoverDetection.ps1:350 char:29
    + … $null = Get-ClusterLog -Node $replica -Destination $Errorlogpath
    + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo : NotSpecified: (:) [Get-ClusterLog], ClusterCmdletException
    + FullyQualifiedErrorId : Get-ClusterLog,Microsoft.FailoverClusters.PowerShell.GetClusterLogCommand

  3. Pingback: SQL Server Availability Group FailoverDetection Utility PowerShell Function Improvements – Named Instances, Archiving Data, Speed | SQL DBA with A Beard

  4. This is great, thanks for doing this!
    I did get a message during the copying of the files, looks like during the ag logs and before the cluster logs. I am admin on the box and instances so not sure if it is a permissions issue. any suggestions would be helpful. Thanks again

    Copying the Always On health Extended Events logs to C:\Users\…..

    WARNING: Failed to get all of the information from the replica SERVERNAME\INSTANCENAME – need to stop

      • Yes it returned “System.Management.Automation.CommandNotFoundException: The term ‘Get-ClusterLog’ is not recognized as the
        name of a cmdlet, function, script file, or operable program” I’ll check into that and let you. Thanks

        • You will need to add the FailOver Cluster feature, on the machine. Use

          Install-WindowsFeature -Name Failover-Clustering –IncludeManagementTools

  5. Does this detect the initial failover and alert the DBA in some way? I’m looking for something I can use to detect when the failover occurs and tie that into something like PagerDuty so I can be alerted whenever a failover happens. PagerDuty will essentially call a list of people until someone answers. Wanting to do this so I can stop paying a third party to watch our stuff in the middle of the night.

    • it is a bit offtopic, but the feature you are looking for is available in SQL Server as a native tool.
      try to google : “SQL Server Agent Alert for Always On + notify operator”
      It will be something about 10 sequential events at the moment of failover…

  6. I have run the Install-WindowsFeature -Name Failover-Clustering -IncludeManagementTools and when I execute it says they still are not installed.
    I am not having any luck after that.

Leave a Reply to Michael KarpenkoCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.