Running Jupyter Notebooks as Agent Jobs

Azure Data Studio is a great tool for connecting with your data platform whether it is in Azure or on your hardware. Jupyter Notebooks are fantastic, you can have words, pictures, code and code results all saved in one document.

I have created a repository in my Github https://beard.media/Notebooks where I have stored a number of Jupyter notebooks both for Azure Data Studio and the new .NET interactive notebooks.

Another thing that you can do with notebooks is run them as Agent Jobs and save the results of the run.

Notebooks running T-SQL

This works easily for T-SQL notebooks. I am going to use this one that I created that uses T-SQL to gather permissions using old code that was in a share somewhere. We can run the notebook and get the permissions and save the notebook and the results will be available for all time (unless you delete the notebook!)

SQL Agent Extension in Azure Data Studio

In Azure Data Studio, if you press CTRL + SHIFT + X it will open the Extensions tab

You can add extra functionality to Azure Data Studio. Search in the top bar for Agent and press the install button to install the extension. You can connect to and instance in the connections tab (CTRL + SHIFT + D) and right click on it and click Manage. This will open up the server dashboard (why isn’t it instance dashboard?)

and you will also have the SQL Agent dashboard available

Its pretty neat, it has green and red bars against the jobs showing success or failure and the larger the bar the longer the run time. On the left you will see a book. Click that

Notebooks in Agent Jobs

You can create an Agent Job to run a notebook. As a notebook is just a json file, it can be stored in a database table. This interface will create two tables one to store the templates and one for the results. Click New Notebook Job

Then navigate to the notebook and select it.

Choose a database for the storage of the template and the results and one for the execution context.

The name of the job will be the file name of the notebook. You can change this but there is a bug where you can only enter one character at a time in the name before it changes focus so beware!

Once the job is created, you will see two tables in the storage database notebooks.nb_materialized and notebooks.nb_template

The materialised table is empty right now

but the template table has a row for the job which includes the notebook in json format.

If you click on the jobs in the Notebook Jobs window in the SQL Agent extension, you can see more information about the job run

You can also run the job from here. It doesn’t have to be run from here, it is just a normal agent job which you can run or schedule in any normal manner. Running it from here gives a pop-up

You have to refresh to see when the job is finished and it will be red if the job failed, green if it succeeded or orange if some cells failed like this!

But this is the good bit. Clicking on that icon will open the notebook that was created by that agent job run. Lets see what we get

You can see that we have the results of the queries that we wrote in the notebook alongside the documentation (or maybe explanation of the expected results)
If we scroll down a little (and change the theme colour so that you can see the error)

Msg , Level , State , Line 
Duplicate column names are not permitted in SQL PowerShell. To repeat a column, use a column alias for the duplicate column in the format Column_Name AS New_Name.

We have got an error from running the code via SQL PowerShell which is how the job is run. This error is also inserted into the notebooks.nb_template table

I edited the notebook locally to remove that block of code

Then edited the job and selected the updated notebook

and re-ran the job and got a green tick.

Now I can open the notebook from the latest run, but notice that from this view I can also open the previous notebook.

If I look in the nb_template table, the last_run_notebook_error has cleared

and if I look in the nb materialized table I can see two rows, one for each job run. The error from the first run is also stored in this table. The notebook column has the json for the notebook if you wish to access it in a different manner.

Tomorrow, we will see what the job steps look like and how to make this run on an instance which does not and cannot have the required PowerShell.

Spoiler Alert – May contain dbatools ๐Ÿ™‚

New .NET Notebooks are here – PowerShell 7 notebooks are here.

Data Science folk used Notebooks for documentation and to show re-runnable research. Azure Data Studio included this notebook functionality and added SQL kernel where with a little bit of faffing you could run PowerShell and then a Python kernel that enabled PowerShell. It seems that notebooks are so cool that everyone is creating them these days! I was browsing twitter when I saw this tweet.

PowerShell 7 Notebooks ๐Ÿ™‚

A notebook experience for PowerShell 7 that sounds amazing. This will enable a true cross-platform PowerShell Notebook experience which is lacking from the Python version as it uses Windows PowerShell on Windows and PowerShell Core on other OS’s

The first thing I asked was – Will this come to Azure Data Studio. I got an immediate response from Sydney Smith PowerShell Program Manager saying it is on the roadmap

Install dependencies

To be able to run the notebook, you need to install some dependencies. First install the .NET CORE SDK which you can download from https://dotnet.microsoft.com/download This needs admin permissions to install.

You also need a Python installation – You can use Anaconda, which you can download from here https://www.anaconda.com/distribution/ This does not need admin to install

Add Anaconda to Windows Terminal

I have added the Anaconda prompt to Windows Terminal so that I have one entry point for all my CLIs. Open the settings file and add the code below. (It will also give you an icon and background.

        {
            // Make changes here to the Anaconda.exe profile
            "guid": "{0caa0dad-35be-5f56-a7ff-afceeeaa6101}",
            "name": "Anaconda",
            "commandline": "cmd.exe /K C:\\Users\\mrrob\\Anaconda3\\Scripts\\activate.bat",
            "hidden": false,
            "backgroundImage": "C:\\Users\\mrrob\\Anaconda3\\Menu\\anaconda-navigator.ico",
            "icon": "C:\\Users\\mrrob\\Anaconda3\\Menu\\anaconda-navigator.ico",
            "backgroundImageAlignment": "topRight",
            "backgroundImageStretchMode": "uniform",
            "backgroundImageOpacity": 0.1
        }

and it appears in the drop down

With Anaconda installed, check that that the kernel is available on your path. If like me you have Azure Data Studio installed, you will have additional kernels but the important one line here is

python3 C:\Users\USERNAME\Anaconda3\share\jupyter\kernels\python3

In Windows Terminal move to a PowerShell 7 prompt and install the dotnet interactive tool

dotnet tool install --global Microsoft.dotnet-interactive

Then you can install the .NET kernel in your Anaconda prompt using this command

dotnet interactive jupyter install

Sometimes new things have errors

I had an error when I tried this first time

Could not execute because the specified command or file was not found.
Possible reasons for this include:
* You misspelled a built-in dotnet command.
* You intended to execute a .NET Core program, but dotnet-interactive does not exist.
* You intended to run a global tool, but a dotnet-prefixed executable with this name could not be found on the PATH.

This is easily fixed by adding %USERPROFILE%\.dotnet\tools to my path with set PATH=%PATH%;%USERPROFILE%\.dotnet\tools

Running jupyter kernelspec list shows that the .NET kernel is installed for C Sharp, F Sharp and .NET PowerShell

Lets open a Notebook

Now you want to play with it!
You can run the lab environment using `jupyter lab`

This opens a browser

You can open existing Azure Data Studio PowerShell notebooks (but not SQL ones)

Sometimes new things have errors Part 2

Unfortunately, I get errors when trying to import Pester which means I can not use my dbachecks notebooks in this blog post. I have raised an issue on the repo here.

Create a New Notebook

But it is easy to create a new Notebook

In the launcher page click the .NET PowerShell button


Which will open a new Notebook in the directory that you launched the lab from. You can then add Code or Markdown as I have described before here.

Then you can add code, markdown and images to create your notebook.

Once you have finished using the notebook lab, you can shut it down in the Anaconda prompt with CTRL + C

Here is a video of running a notebook which anyone can use to create a couple of Docker containers running SQL 2019 and query them with dbatools. You can find the notebook further down this post.

Sharing Notebooks

You can create notebooks to run common tasks. Even better, from the lab you can convert the notebook including the results to a variety of formats to share with other none-technical people. I used this functionality this week to export Azure Data Studio Notebooks to HTML and PDF for a Project manager ๐Ÿ™‚

You can find the Export Notebook command in the File menu

Exporting to HTML did not export the images but it does include the results

You can share notebooks via GitHub – Either in a gist like this

or by providing a straight link to the notebook in GitHub https://github.com/SQLDBAWithABeard/Notebooks/blob/master/notebooks/Exploring%20dbatools.ipynb

You can also use Binder https://mybinder.org/

This uses Docker to create an interactive Notebook. Create a Github repo like https://github.com/SQLDBAWithABeard/Notebooks (or just clone it) Copy your notebooks into the notebooks folder and push the changes to Github and then go to https://mybinder.org/ and add your URL to the repository.

You can see what it looks like by clicking the button below which Binder creates for you

Unfortunately the kernel only supports Python for the moment but you can see the possibilities ๐Ÿ™‚

How to fork a GitHub repository and contribute to an open source project

I enjoying maintaining open source GitHub repositories such as dbachecks and ADSNotebook. I absolutely love it when people add more functionality to them.

To collaborate with a repository in GitHub you need to follow these steps

  • Fork the repository into your own GitHub
  • Clone the repository to your local machine
  • Create a new branch for your changes
  • Make some changes and commit them with useful messages
  • Push the changes to your repository
  • Create a Pull Request from your repository back to the original one

You will need to have git.exe available which you can download and install from https://git-scm.com/downloads if required

Fork the repository into your own GitHub

A fork is a copy of the original repository. This allows you to make changes without affecting the original project. It does not get updated when the original project gets updated (We will talk about that in the next post) This enables you to code a new feature or a bug fix, test it locally and make sure it is working.

Let’s take dbachecks as our example. Start by going to the project in GiHub. In this case the URL is https://github.com/sqlcollaborative/dbachecks You will see a Fork button at the top right of the page

When you click the button the repository is copied into your own GitHub account

The page will open at https://github.com/YOURGITHUBUSERNAME/NameOfRepository in this case https://github.com/SQLDBAWithABeard/dbachecks You will be able to see that it is a fork of the original repository at the top of the page

Clone the repository to your local machine

Forking the repository has created a remote repository stored on the GitHub servers. Now that the repository has been forked you need to clone it to your local machine to create a local repository so that you can start coding your amazing fix. When you have finished you can then sync it back to your remote repository ready for a Pull Request back to the original repository.

In your browser, at your remote repository that you just created (https://github.com/YOURGITHUBUSERNAME/NameOfRepository if you have closed the page) click on Clone or Download and then the icon to the right to copy the url

You can clone your repository in VS Code or Azure Data Studio by clicking F1 or CTRL + SHIFT + P in Windows or Linux and โ‡งโŒ˜P or F1 on a Mac

then start typing clone until you see Git:Clone and press enter or click

Paste in the URL that you just copied and click enter. A dialog will open asking you to select a folder. This is the parent directory where your local repository will be created. The clone will create a directory for your repository so you do not need to. I suggest that you use a folder called GitHub or something similar to place all of the repositories that you are going to clone and create.

When it has finished it will ask you if you wish to open the repository

if you click Open it will close anything that you have already got opened and open the folder. If you click Add to Workspace it will add the folder to the workspace and leave everything you already had open as it was and surprisingly clicking Open in New Window will open the folder in a new instance of Visual Studio Code or Azure Data Studio!

and you will also be able to see the local repository files on your computer

You can clone the repository at the command line if you wish by navigating to your local GitHub directory and running git clone TheURLYouCopied

Now your local repository has been created, it’s time to do your magic coding.

Create a new branch for your changes

It is a good idea to create a branch for your amazing new feature This enables you to work on coding for that feature in isolation. It has the added advantage that if you mess it right royally up, you can just delete that branch and start again with a new one!

To create a branch in VS Code or Azure Data Studio you can click on the branch name at the bottom left.

Or open the Command Palette and type Branch until you see Git: Create Branch

You will be prompted for a branch name

I like to choose a name that relates to the code that I am writing like configurable_engine or removeerroringexample You can see the name of the branch in the bottom left so that you always know which branch you are working on.

The icon shows that the branch is only local and hasn’t been pushed (published) to the remote repository yet

Make some changes and commit them with useful messages

Now you can start writing your code for your awesome new feature, bug fix or maybe just documentation improvement. Keep your commits small and give them useful commit messages that explain why you have made the change as the diff tooling will be able to show what change you have made

Write your code or change the documentation, save the file and in Visual Studio Code or Azure Data Studio you will see that the source control icon has a number on it

Clicking on the icon will show the files that have changes ready

You can write your commit message in the box and click CTRL + ENTER to commit your changes with a message

If you want to do this at the command line, you can use git status to see which files have changes

You will need to git add .or git add .\pathtofile to stage your changes ready for committing and then git commit -m 'Commit Message' to commit them

Notice that I did exactly what I just said not to do! A better commit message would have been So that people can find the guide to forking and creating a PR

Push the changes to your repository

You only have the changes that you have made in your local repository on your computer. Now you need to push those changes to Github your remote repository. You can click on the publish icon

You will get a pop-up asking you if you wish to stage your changes. I click Yes and never Always so that I can use this prompt as a sanity check that I am doing the right thing

At the command line you can push the branch, if you do that, you will have to tell git where the branch needs to go. If you just type git push it will helpfully tell you

fatal: The current branch AwesomeNewFeature has no upstream branch.
To push the current branch and set the remote as upstream, use

    git push --set-upstream origin AwesomeNewFeature

So you will need to use that command

You can see in the bottom left that the icon has changed

and if you read the output of the git push command you will see what the next step is also.

Create a Pull Request from your repository back to the original one

You can CTRL click the link in the git push output if you have pushed from the command line or if you visit either you repository or the original repository in your browser you will see that there is a Compare and Pull Request button

You click that and let GitHub do its magic

and it will create a Pull Request for you ready for you to fill in the required information, ask for reviewers and other options. Once you have done that you can click Create pull request and wait for the project maintainer to review it and (hopefully) accept it into their project

You can find the Pull Request that I created here https://github.com/sqlcollaborative/dbachecks/pull/720 and see how the rest of this blog post was created.

If you make more changes to the code in the same branch in your local repository and push them, they will automatically be added to this Pull Request whilst it is open. You can do this if the maintainer or reviewer asks for changes.

Shane has asked for a change

So I can go to my local repository in Azure Data Studio and make the requested change and save the file. If I look in the source control in Azure Data Studio I can again see there is a change waiting to be committed and if I click on the name of the file I can open the diff tool to see what the change was

Once I am happy with my change I can commit it again in the same way as before either in the editor or at the command line. The icon at the bottom will change to show that I have one commit in my local repository waiting to be pushed

To do the same thing at the command line I can type git status and see the same thing.

I can then push my change to my remote repository either in the GUI or by using git push

and it will automatically be added to the Pull Request as you can see

Now that the required changes for the review have been made, the review has been approved by Shane and the pull request is now ready to be merged. (You can also see that dbachecks runs some checks against the code when a Pull Request is made)

Many, many thanks to Shane b | t who helped with the writing of this post even whilst on a “no tech” holiday.

Go Ahead – Contribute to an Open Source Project

Hopefully you can now see how easy it is to create a fork of a GitHub repository, clone it to your own machine and contribute. There are many open source projects that you can contribute to.

You can use this process to contribute to the Microsoft Docs for example by clicking on the edit button on any page.

You can contribute other open source projects like

or go and find the the ones that you use and can help with.

Fixing the ‘Failed to generate the compressed file for module ‘C:\Program Files\dotnet\dotnet.exe’ error when deploying to the PowerShell Gallery using Azure DevOps

The PowerShell module for validating your SQL Server estate dbachecks is deployed via Azure DevOps, you can see how it is working (or not) via this link

Grrr Automation for the Lose!

Until recently, this had worked successfully. In the last few weeks I have been receiving errors

 Exception : Microsoft.PowerShell.Commands.WriteErrorException: Failed to generate the compressed file for module 'C:\Program Files\dotnet\dotnet.exe failed to pack: error 
 C:\Program Files\dotnet\sdk\3.0.100\Sdks\NuGet.Build.Tasks.Pack\build\NuGet.Build.Tasks.Pack.targets(198,5): error : 
2 Index was outside the bounds of the array. 
 [C:\Users\VssAdministrator\AppData\Local\Temp\cbc14ba6-5832-46fd-be89-04bb552a83ac\Temp.csproj]
'.
At C:\Program Files\WindowsPowerShell\Modules\PowerShellGet\2.2.1\PSModule.psm1:10944 char:17
20       Publish-PSArtifactUtility @PublishPSArtifactUtility_Param ...
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo          : InvalidOperation: (:) [Write-Error], WriteErrorException
2019-11-25T22:44:46.8459493Z     + FullyQualifiedErrorId : FailedToCreateCompressedModule,Publish-PSArtifactUtility

You can see these errors in the release pipeline logs here

Confusion

This was very frustrating as it was stopping the continuous delivery to the PowerShell Gallery. It was even more confusing as I was successfully deploying the ADSNotebook module to the gallery using the same method as you can see here.

Raise an Issue on GitHub

I went and looked at the PowerShellGet GitHub repository and opened an issue I also found another issue regarding Required Modules

But this doesnt help to get dbachecks released.

Just Try to Make it Work

I asked the wonderful folk in the PowerShell Slack channel – Through the magic of automation, you can also interact with them via the powershellhelp channel in the SQL Server Slack as well but there were no answers that could assist.

So I had to go searching for an answer. PowerShellGet uses nuget for package management. I found that if I downloaded an earlier version and placed it in my user profile (in the right location) I could publish the module.

I found this out by removing the nuget.exe from anywhere useful on the machine and trying to publish the module. The error message says

NuGet.exe upgrade is required to continue
This version of PowerShellGet requires minimum version '4.1.0' of NuGet.exe to publish an item to the NuGet-based repositories. NuGet.exe must be available in 
'C:\ProgramData\Microsoft\Windows\PowerShell\PowerShellGet\' or 'C:\Users\BeardyMcBeardFace\AppData\Local\Microsoft\Windows\PowerShell\PowerShellGet\', or under 
one of the paths specified in PATH environment variable value. NuGet.exe can be downloaded from https://aka.ms/psget-nugetexe. For more information, see 
https://aka.ms/installing-powershellget . Do you want PowerShellGet to upgrade to the latest version of NuGet.exe now?

If I said yes then I got the latest version and the error continued.

However, on my laptop I can go to the nuget downloads page and download an earlier version and place it in one of those paths then I could publish the module.

Can I Automate it?

I would rather not have to deploy manually though, and as I use hosted agents my access to the operating system is limited so I wondered if I could place the nuget.exe in the user profile and it would get used or if it would look for the the latest one. Turns out it uses the one in the user profile ๐Ÿ™‚

So now I have this code as a step in my Azure DevOps Release pipeline before calling Publish-Module and we have automated the releases again.

and now deployments to the PowerShell Gallery are just triggered by the build and the pipeline is green again ๐Ÿ™‚

Create a PowerShell Notebook for Azure Data Studio with PowerShell

The latest update to the ADSNotebook PowerShell module I blogged about here now enables the creation of PowerShell notebooks with PowerShell.

You can install the module with

Install-Module ADSNotebook

or if you have already installed it you can use

Update-Module ADSNotebook

In the latest release, there is an extra parameter for New-AdsWorkBook of -Type which will accept either SQL or PowerShell

Create a PowerShell Notebook with PowerShell Rob

OK!

Here is some code to create a PowerShell Notebook. First we will create some cells using New-AdsWorkBookCell including all the markdown to add images and links. You can find my notebooks which explain how to write the markdown for your notebooks in my GitHub Presentations Repository

$introCelltext = "# Welcome to my Auto Generated PowerShell Notebook

## dbatools
![image](https://user-images.githubusercontent.com/6729780/68845538-7afcd200-06c3-11ea-952e-e4fe72a68fc8.png)  

dbatools is an open-source PowerShell Module for administering SQL Servers.
You can read more about dbatools and find the documentation at [dbatools.io](dbatools.io)
"
$SecondCelltext = "### Installation
You can install dbatools from the PowerShell Gallery using `Install-Module dbatools`
"

$thirdcelltext = "Install-Module dbatools"

$fourthCelltext = "### Getting Help
You should always use `Get-Help` to fins out how to use dbatools (and any PowerShell) commands"

$fifthcelltext = "Get-Help Get-DbaDatabase"
$sixthCelltext = "Try a command now. get the name, owner and collation of the user databases on the local instance"
$seventhCellText = "Get-DbaDatabase -SqlInstance localhost -ExcludeSystem | Select Name, Owner, Collation"

$Intro = New-ADSWorkBookCell -Type Text -Text $introCelltext
$second = New-ADSWorkBookCell -Type Text -Text $SecondCelltext
$third = New-ADSWorkBookCell -Type Code -Text $thirdcelltext
$fourth = New-ADSWorkBookCell -Type Text -Text $fourthCelltext
$fifth = New-ADSWorkBookCell -Type Code -Text $fifthcelltext
$sixth = New-ADSWorkBookCell -Type Text -Text $sixthCelltext
$seventh = New-ADSWorkBookCell -Type Code -Text $seventhCellText

Then we will create a new workbook using those cells

$path = 'C:\temp\dbatools.ipynb'
New-ADSWorkBook -Path $path -cells $Intro,$second,$third,$fourth,$fifth,$sixth,$Seventh -Type PowerShell

Then, when that code is run we can open the Notebook and ta-da

And it is super quick to run as well

UPDATE – Tyler Leonhardt t from the PowerShell team asked

Challenge accepted, with extra meta, here is the PowerShell to create a PowerShell Notebook which will create a PowerShell Notebook!!

Create Azure Data Studio SQL Notebooks with PowerShell

At PASS Summit today I gave a presentation about SQL Notebooks in Azure Data Studio for the DBA. I demo’d the PowerShell module ADSSQLNotebook.

which you can also find on GitHub (where I will be glad to take PR’s to improve it ๐Ÿ™‚ )

This module has 3 functions

This module contains only 3 commands at present

  • Convert-ADSPowerShellForMarkdown

This will create the markdown link for embedding PowerShell code in a Text Cell for a SQL Notebook as described in this blog post

  • New-ADSWorkBookCell

This command will create a workbook text cell or a code cell for adding to the New-ADSWorkBook command

  • New-ADSWorkBook

This will create a new SQL Notebook using the cell objects created by New-ADSWorkBookCell

Usage

Convert-ADSPowerShellForMarkdown

Convert-ADSPowerShellForMarkdown -InputText "Get-ChildItem" -LinkText 'This will list the files' -ToClipBoard

Converts the PowerShell so that it works with MarkDown and sets it to the clipboard for pasting into a workbook cell

New-ADSWorkBookCell

$introCelltext = "# Welcome to my Auto Generated Notebook

## Automation
Using this we can automate the creation of notebooks for our use
"
$Intro = New-ADSWorkBookCell -Type Text -Text $introCelltext

Creates an Azure Data Studio Text cell and sets it to a variable for passing to  New-AdsWorkBook

New-ADSWorkBook

$introCelltext = "# Welcome to my Auto     Generated Notebook

## Automation
Using this we can automate the creation of notebooks for our use
"
$SecondCelltext = "## Running code
The next cell will have some code in it for running

## Server Principals
Below is the code to run against your     instance to find the server principals that are enabled"

$thirdcelltext = "SELECT Name
FROM sys.server_principals
WHERE is_disabled = 0"
$Intro = New-ADSWorkBookCell -Type Text -Text $introCelltext
$second = New-ADSWorkBookCell -Type Text  -Text $SecondCelltext
$third = New-ADSWorkBookCell -Type Code -Text $thirdcelltext

$path = 'C:\temp\AutoGenerated.ipynb'
New-ADSWorkBook -Path $path -cells $Intro,$second,$third

Creates 3 cells with New-AdsWorkBookCells to add to the workbook,
two text ones and a code one, then creates a SQL Notebook with
those cells and saves it as     C:\temp\AutoGenerated.ipynb

Installation

You can install this Module from the PowerShell Gallery using

Install-Module ADSNotebook

Compatability

This module has been tested on Windows PowerShell 5.1, PowerShell Core 6 and PowerShell 7 on Windows 10 and Ubuntu

Demo