SQL Server Availability Group FailoverDetection Utility PowerShell Function Improvements – Named Instances, Archiving Data, Speed

In my last post I wrote about a new function for gathering the data and running the FailoverDetection utility by the Tiger Team to analyse availability group failovers. I have updated it following some comments and using it for a day.

Don’t forget the named instances Rob!

Michael Karpenko wrote a comment pointing out that I had not supported named instances, which was correct as it had not been written for that. Thank you Michael 🙂 I have updated the code to deal with named instances.

Confusing results

I also realised as we started testing the code that if you had run the code once and then ran it again against a different availability group the tool does not clear out the data folder that it uses so you can get confusing results.

In the image below I had looked at the default instance and then a MIRROR named instance. As you can see the results json on the left shows the default instance SQLClusterAG while the one on the right shows both the SQLClusterAG and the MirrrAG instance results.

duplicate results.png

This is not so useful if you don’t notice this at first with the expanded json!! Now you may in this situation want to see the combined results from all of the availability groups on one cluster. You could gather all of the data from each instance and then add it to the data folder easily enough.

By cleaning out the data folder before running the utility the results are as expected.

duplicate results fixed.png

Archive the data for historical analysis

One of the production DBAs pointed out that having gathered the information, it would be useful to hold it for better analysis of repeated issues. I have added an archiving step so that when the tools runs, if there is already data in the data gathering folder, it will copy that to an archive folder and name it with the date and time that the cluster log was created as this is a good estimation of when the analysis was performed. If an archive folder location is not provided it will create an archive folder in the data folder. This is not an ideal solution though, as the utility will copy all of the files and folders from there to its own location so it is better to define an archive folder in the parameters.

Get-Eventlog is sloooooooooooow

I was running the tools and noticed it sat running the system event log task for a long long time. I ran some tests using a variation of the dbatools prompt.

This will show in the prompt how long it took to run the previous statement .

speed.png

In the image above (which you can click to get a larger version as with all images on this blog) you can see that it took 18ms to set the date variable, FOUR MINUTES and FORTY THREE seconds to get the system log in the last 2 days using Get-EventLog and 29.1 seconds using Get-WinEvent and a FilterHashtable.

Getting the function

This function requires PowerShell version 5 and the dbatools module.

You can get the function from my GitHub Functions Repository here (at the moment – will be adding to dbatools see below)

Load the function by either running the code or if you have it saved as a file dot-sourcing it.

. .\Invoke-SqlFailOverDetection.ps1

There are two .’s with a space in between and then a \ without a space. so Dot Space Dot Whack path to file.

The next thing you should do is what you should always do with a new PowerShell function, look at the help.

Get-Help Invoke-SqlFailOverDetection -Detailed

You will find plenty of examples to get you going and explanations of all of the parameters and more info on my previous post.

Happy Automating!

Gathering all the Logs and Running the Availability Group Failover Detection Utility with PowerShell

30/11/2018 – Function has been updated to deal with named instances.

Last week the Tiger Team released their Availability Group Failover Detection Utility which will provide root cause analysis on Cluster Logs, SQL Error Logs, and the Availability groups extended events logs. There is a blog post here and the tool can be downloaded from the Tiger Team GitHub Repository

A Bit of Faffing*

It states on the readme for the Tiger Team GitHub Repository.

Repository for Tiger team for “as-is” solutions and tools/scripts that the team publishes.

The important words are “as-is” sometimes these tools need a bit of faffing some looking after!

There is a pre-requisite and sometimes a little “fixing” that you need to do to get it to run correctly.

First, install the “Microsoft Visual C++ Redistributable for Visual Studio 2017” from here. On the download page, scroll down to the “Other Tools and Frameworks” section to download the redistributable (x64 version).

cdistributable.PNG

Then when you run FailoverDetection.exe you may get strong name validation errors like.

strong name.png

Unhandled Exception: System.IO.FileLoadException: Could not load file or assembly ‘Microsoft.Sq1Server.XEvent.Linq, Version=15.0.0.0, Culture=neutral, PublicKeyToken=89845dcd808cc91’ or one of it s dependencies. Strong name validation failed. (Exception from HRESULT; 0x8013141A) – – – >.Security.SecurityException: Strong name validation failed. (Exception from HRESULT: 0x8e13141A)
—End of inner exception stack trace  —
at FailoverDetector. XeventParser.LoadXevent(String xelFi1eName, String serverName)

Then you will need to run the sn.exe tool which is in the zip file. Use this syntax.

stroingname fix.png

I had to do it for two DLLs.

NOTE – If you get an error like this when running sn.exe (or any executable) from PowerShell it means that you have missed the .\ (dot whack) in front of the executable name.

striong name fail.png

* Faffing – Doing something that is a bit awkward See Link .

Logs required for the Tool

To run the Failover Detection Utility you need to gather the following information from each replica and place it in the specified data folder.

  • SQL error logs
  • Always On Availability Groups Extended Event Logs
  • System Health Extended Event Logs
  • System log
  • Windows cluster log

Once you have gathered all of that data then you need to alter the configuration file for the executable.

Running The Tool

Once you have done that you can then run the Failover Detection Utility. You can double click the exe,

run the exe.PNG

or you can run it from the command line.

run the exe with powershell.PNG

In both cases it won’t exit so when you see the Saving Results to JSON file, you can press enter (sometimes twice!).

The results can be seen in the JSON file which will be stored in a Results directory in the directory that the the FailoverDetection.exe exists.

results.PNG

You can also use some switches with the FailoverDetection utility.

–Analyze – When “–Analyze” is specified as a parameter, the utility will load configuration file without copying log data. It assumes the log files have already been copied over. It does everything as default mode except copying log data. This option is useful if you already have the data in the local tool execution subdirectories and want to rerun the analysis.

-Show -The utility after analyzing log data will display the results in the command console. Additionally, the results will be persisted to a JSON file in the results folder.

They look like this

results - show.PNG

Again, you need to press enter for the details to come through. The results are still saved to the Results folder as json as well so you won’t lose them.

When You Are Doing Something More Than Once ….

Automate it 🙂

When I saw the data that needed to be gathered for this tool, I quickly turned to PowerShell to enable me to easily gather the information. That has turned into a function which will

  • Download and extract the zip file from the Tiger Team GitHub repository
  • Identify all of the replicas for an Availability Group and dynamically create the configuration JSON file
  • Gather all of the required log files and place them in a specified data folder
  • Run the FailoverDetection.exe with any of the switches
  • Includes -Verbose, -Confirm, -Whatif switches so that you can easily see what is happening, be prompted to confirm before actions or see what would happen if you ran the function
  • You still need to press enter at the end though 🙁
  • and you will still need to install the “Microsoft Visual C++ Redistributable for Visual Studio 2017” and runt he strong names tool if needed

This function requires PowerShell version 5, the failovercluster module and and the dbatools module.

You can get the function from my GitHub Functions Repository here (at the moment – will be adding to dbatools see below)

Load the function by either running the code or if you have it saved as a file dot-sourcing it.

There are two .’s with a space in between and then a \ without a space. so Dot Space Dot Whack path to file.

The next thing you should do is what you should always do with a new PowerShell function, look at the help.

You will find plenty of examples to get you going and explanations of all of the parameters.

Let’s see it in action.

First lets run with a -WhatIf switch which will show us what will happen without performing any state changing actions.

whatif.PNG

So you can see that if we run it without the -WhatIf switch it will

  • Create some directories
  • Download the zip file from the repo
  • Extract the zip file
  • Copy the required logs from each of the replicas to the data folder
  • Create the JSON configuration file
  • Run the executable

NOTE : – I have limited the gathering of the system event log to the last 2 days to limit the amount of time spent dealing with a large system log. I gather all of the SQL Error logs in the Error log path as that works for the first scenario I wrote this for, your mileage may vary.

So if we want to run the command we can remove the -WhatIf switch.

It can take a little while to run depending on the number of replicas, size of logs etc but once it has started running you can do other things.

It will require being run as an account with permissions to all of the folders specified and Windows and SQL permissions on all of the replicas in the Availability Group.

run1.PNG

As you can see below it has gathered all of the results and placed them in the data folder.

datagathered.PNG

The results can be found in the results folder.

resultsjson.PNG

If I have already run the tool, I can use the Analyze switch to save gathering the data again. I also use the AlreadyDownloaded switch as I do not need to download the zip file again.

analyze.PNG
and the results are again saved in the results folder.
I can show the results on the screen as well as saving them as JSON with the Show parameter.

show.PNG

You will then need to press enter to get the next lot of results.

more show results.PNG

Why Not Add This To dbatools?

I haven’t added this to dbatools (yet) because I wrote it in this way for a particular need and dbatools requires support for PowerShell V3 . I have, however created an issue added to this issue in the dbatools GitHub Repository (as this is how you to start the process of adding things to dbatools) so hopefully we can get it in there soon as well – in which case I will come back and update this post.

 

Happy Automating!

Checking Availability Groups with dbachecks

It’s been 45 days since we released dbachecks

Since then there have been 25 releases to the PowerShell Gallery!! Today release 1.1.119 was released 🙂 There have been over 2000 downloads of the module already.

In the beginning we had 80 checks and 108 configuration items, today we have 84 checks and 125 configuration items!

If you have already installed dbachecks it is important to make sure that you update regularly. You can do this by running

If you want to try dbachecks, you can install it from the PowerShell Gallery by running

You can read more about installation and read a number of blog posts about using different parts of dbachecks at this link https://dbatools.io/installing-dbachecks/

HADR Tests

Today we updated the HADR tests to add the capability to test multiple availability groups and fix a couple of bugs

Once you have installed dbachecks you will need to set some configuration so that you can perform the tests. You can see all of the configuration items and their values using

get-config.png

You can set the values with the Set-DbcConfig command. It has intellisense to make things easier 🙂 To set the values for the HADR tests

  • app.cluster requires one of the nodes of the cluster.
  • app.computername requires the windows computer names of the machines to run operating system checks against
  • app.sqlinstance requires the instance names of the SQL instances that you want to run SQL checks against (These are default instances but it will accept SERVER\INSTANCE)
  • domain.name requires the domain name the machines are part of
  • skip.hadr.listener.pingcheck is a boolean value which defines whether to skip the listener ping check or not. As this is in Azure I am skipping the check by setting the value to $true
  • policy.hadr.tcpport is set to default to 1433 but you can also set this configuration if your SQL is using a different port
NOTE – You can find all the configuration items that can skip tests by running
skips.png
Now we have set the configuration (For the HADR checks – There are many more configurations for other checks that you can set) you can run the checks with
check results.png
This runs the following checks
  • Each node on the cluster should be up
  • Each resource on the cluster should be online
  • Each SQL instance should be enabled for Always On
  • Connection check for the listener and each node
    • Should be pingable (unless skip.hadr.listener.pingcheck is set to true)
    • Should be able to run SQL commands
    • Should be the correct domain name
    • Should be using the correct tcpport
  • Each replica should not be in unknown state
  • Each synchronous replica should be synchronised
  • Each asynchronous replica should be synchonising
  • Each database should be synchronised (or synchronising) on each replica
  • Each database should be failover ready on each replica
  • Each database should be joined to the availability group on each replica
  • Each database should not be suspended on each replica
  • Each node should have the AlwaysOn_Health extended event
  • Each node should have the AlwaysOn_Health extended event running
  • Each node should have the AlwaysOn_Health extended event set to auto start

(Apologies folk over the pond, I use the Queens English 😉 )

This is good for us to be able to run this check at the command line but we can do more.

We can export the results and display them with PowerBi. Note we need to add -PassThru so that the results go through the pipeline and that I used -Show Fails so that only the titles of the Describe and Context blocks and any failing tests are displayed to the screen

results.png

This will create a file at C:\Windows\Temp\dbachecks and open the PowerBi report. You will need to refresh the data in the report and then you will see

dbachecks.png

Excellent, everything passed 🙂

Saving Configuration for reuse

We can save our configuration using Export-DbcConfig which will export the configuration to a json file

so that we can run this particular set of tests with this comfiguration by importing the configuration using Import-DbcConfig

In this way you can set up different check configurations for different use cases. This also enables you to make use of the checks in your CI/CD process. For example, I have a GitHub repository for creating a domain, a cluster and a SQL 2017 availability group using VSTS. I have saved a dbachecks configuration to my repository and as part of my build I can import that configuration, run the checks and output them to XML for consumption by the publish test results task of VSTS

After copying the configuration to the machine, I run

in my build step and then use the publish test results task and VSTS does the rest 🙂
VSTS results.png

 

 

Comparing Agent Jobs across Availability Group Replicas with PowerShell

On the plane home from PAS Summit I was sat next to someone who had also attended and when he saw on my laptop that I was part of the SQL Community we struck up a conversation. He asked me how he could compare SQL Agent Jobs across availability group replicas to ensure that they were the same.

He already knew that he could use Copy-DbaAgentJob from dbatools to copy the jobs between replicas and we discussed how to set up an Agent job to accomplish this. The best way to run an Agent Job with a PowerShell script is described here

Compare-Object

I told him about Compare-Object a function available in PowerShell for precisely this task. Take these two SQL instances and their respective Agent Jobs

agentjobcompare.png

So we can see that some jobs are the same and some are different. How can we quickly and easily spot the differences?

Those three lines of code will do it. The first two get the agent jobs from each instance and assign them to a variable and the last one compares them. This is the output
comparison.png
The arrows show that the first three jobs are only on the Bolton instance and the bottom three jobs are only on the default instance.

What If ?

 Another option I showed was to use the -WhatIf switch on Copy-DbaAgentJob. This parameter is available on all good PowerShell functions and will describe what the command would do if run WARNING – If you are using the old SQLPS module from prior to the SSMS 2016 release -WhatIf will actually run the commands so update your modules.
We can run

and get the following result

which shows us that there are two jobs on Rob-XPS which would be created on the Bolton instance

And if they have been modified?

Thats good he said, but what about if the jobs have been modified?
Well one thing you could do is to compare the jobs DateLastModified property by using the -Property parameter and the passthru switch
This is going to return the jobs which are the same but were modified at a different time
sortedjobcompare.png
so that you can examine when they were changed. Of course the problem with that is that the DateLastModified is a very precise time so it is pretty much always going to be different. We can fix that but now it is a little more complex.

Just the Date please

We need to gather the jobs in the same way but create an array of custom objects with a calculated property like this
and then we can compare on the Date field. The full code is

This will look like this

datecompare.png
Which is much better and hopefully more useful but it only works with 2 instances

I have more than 2 instances

So if we have more than 2 instances it gets a little more complicated as Compare-Object only supports two arrays. I threw together a quick function to compare each instance with the main instance. This is very rough and will work for now but I have also created a feature request issue on the dbatools repository so someone (maybe you ?? ) could go and help create those commands

which looks like this. It’s not perfect but it will do for now until the proper commands are created

compare agent jobs.png

Using PowerShell to set Extended Events Sessions to AutoStart

When you look after more than a few SQL Servers you will need to perform the same actions against a number of  them and that is where PowerShell will be of great benefit. Recently I needed to ensure that all SQL Servers had a certain Extended Event Session set to auto-start and that it was running. I have used the Always On health session in the example below but you could use the same code below and do this for any Extended Event session. Just note that the code below checks for the existence of an Availability Group which may not be what you require.

As always when I started to look at Powershell for a solution I turned to MSDN and found this page and also a quick search found Mike Fals blogpost which showed me how to get going.

I used my DBA Database as described in my previous posts and created a query to check for all of the servers that were active and contactable

SELECT

IL.ServerName

FROM [dbo].[InstanceList] IL

WHERE NotContactable = 0

AND Inactive = 0

and used Invoke-SQLCMD to gather the Server Names


$Results = (Invoke-Sqlcmd -ServerInstance $DBADatabaseServer -Database DBADatabase -Query $query -ErrorAction Stop).ServerName

Then it was a case of looping through the servers and connecting to the XEvent Store and checking if the required extended evetn was started and set to auto-start and if not altering those settings

## Can we connect to the XEStore?
if(Test-Path SQLSERVER:\XEvent\$Server)
{
$XEStore = get-childitem -path SQLSERVER:\XEvent\$Server -ErrorAction SilentlyContinue  | where {$_.DisplayName -ieq 'default'}
$AutoStart = $XEStore.Sessions[$XEName].AutoStart
$Running = $XEStore.Sessions[$XEName].IsRunning
Write-Output "$server for $AGNames --- $XEName -- $AutoStart -- $Running"
if($AutoStart -eq $false)

{
$XEStore.Sessions[$XEName].AutoStart = $true
$XEStore.Sessions[$XEName].Alter()
}

if($Running -eq $false)
{
$XEStore.Sessions[$XEName].Start()
}
}

Very quick and simple and hopefully of use to people, this could easily be turned into a function. The full script is below and also available here on the Powershell gallery or by running  Save-Script -Name Set-ExtendedEventsSessionstoAutoStart -Path <path>

<#
.Synopsis
   Connects to the servers in the DBA Database and for Servers above 2012 sets alwayson_health Extended Events Sessions to Auto-Start and starts it if it is not running
.DESCRIPTION
   Sets Extended Events Sessions to Auto-Start and starts it if it is not running
.EXAMPLE
   Alter the XEvent name and DBADatabase name or add own server list and run
.NOTES
   AUTHOR - Rob Sewell
   BLOG - http://sqldbawithabeard.com
   DATE - 20/03/2016
#>
$DBADatabaseServer 
$XEName = 'AlwaysOn_health'
## Query to gather the servers required
$Query = @"

SELECT 

IL.ServerName

FROM [dbo].[InstanceList] IL

WHERE NotContactable = 0

AND Inactive = 0

"@

Try 
{
$Results = (Invoke-Sqlcmd -ServerInstance $DBADatabaseServer -Database DBADatabase -Query $query -ErrorAction Stop).ServerName
}

catch 
{
Write-Error "Unable to Connect to the DBADatabase - Please Check"
}

foreach($Server in $Results)

    {
        try
            {
            $srv = New-Object ('Microsoft.SqlServer.Management.Smo.Server') $Server
            }
        catch
            {
            Write-Output " Failed to connect to $Server"
            continue
            }
            # To ensure we have a connection to the server
            if (!( $srv.version)){
            Write-Output " Failed to Connect to $Server"
            continue
            }
        if($srv.versionmajor -ge '11')
            {
            ## NOTE this checks if there are Availability Groups - you may need to change this
            if ($srv.AvailabilityGroups.Name)
                {
                $AGNames = $srv.AvailabilityGroups.Name   
                ## Can we connect to the XEStore?                             
                if(Test-Path SQLSERVER:\XEvent\$Server)
                    {
                    $XEStore = get-childitem -path SQLSERVER:\XEvent\$Server -ErrorAction SilentlyContinue  | where {$_.DisplayName -ieq 'default'} 
                    $AutoStart = $XEStore.Sessions[$XEName].AutoStart
                    $Running = $XEStore.Sessions[$XEName].IsRunning
                    Write-Output "$server for $AGNames --- $XEName -- $AutoStart -- $Running"
                    if($AutoStart -eq $false)
                    
                        {
                        $XEStore.Sessions[$XEName].AutoStart = $true
                        $XEStore.Sessions[$XEName].Alter()
                        }
                    
                      if($Running -eq $false)
                        {
                        $XEStore.Sessions[$XEName].Start()
                        } 
                    }
                else
                    {
                    Write-Output "Failed to connect to XEvent on $Server"
                    }
                }

            else
                {
                ## Write-Output "No AGs on $Server"
                }
            }
        else
            {
            ##  Write-Output "$server not 2012 or above"
            }
} 

 

Refreshing Availability Group Database with PowerShell

Following last weeks post on Refreshing A Mirrored Database with PowerShell I thought I would write the script to refresh an Availability Group Database.

An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together. An availability group supports a set of primary databases and one to eight sets of corresponding secondary databases.You can read more about Availability groups here

There are situations where you may need to refresh these databases. Disaster Recovery is an obvious one but also during development to provide testing or development environments to test your High Availability implementations, run through disaster scenarios, create run books or ensure that the code changes still work with AG. There are other scenarios but this post covers the automation of restoring an Availability Group Database from a backup.

The steps that you need to take to restore an Availability Group Database are

Remove Database from the Availability Group
Restore the Primary Replica Database
Backup the Primary Replica Database Transaction Log
Restore the Secondary and Tertiary Replica Databases with no recovery
Add the Database back into the Availability Group
Resolve Orphaned Users – Not covered in this script
Check the status

Here is my set up for this post

image

I have 3 servers SQL2012SER08AG1, SQL2012SER08AG2 and SQL2012SER08AG3 with 3 databases in an Availability Group called AG_THEBEARD1. SQL2012SER08AG2 is set up as a secondary replica using Synchronous-Commit Mode SQL2012SER08AG3 is set up as a read only replica using Asynchronous-Commit Mode. I have three databases in my Availability Group and today I shall use the database called TestDatabase (I have no imagination today!) to demonstrate the refresh

The script requires some variables to be set up at the beginning. You can easily change this and make the script into a function and call it if you desire, but for this post I shall consider the script as a standalone. The reasoning for this is that I imagine that it will be placed into a run book or stored for use in a repository for specific use and therefore reduces any pre-requisites for using it.

First we will remove the database from the Availability Group. This is achieved using the Remove-SqlAvailabilityDatabase CMDLet

 

Next  Restore the Primary Replica Database, Backup the Primary Replica Database Transaction Log
and Restore the Secondary and Tertiary Replica Databases with no recovery using Restore-SqlDatabase and Backup-SqlDatabase (You can also use the SMO method in the previous post if you wish)

 

Then add the database back to the Availability Group

Finally test the status of the Availability Group

I also like to add some output to show the progress of the script. This can be logged using Out-File or displayed on the screen using Out-Host.

Here are the results of my script

image

Here is the script