Author: seddryck

Creating and querying a database using the Graph API on the Azure Cosmos DB emulator

When developing on Azure Cosmos DB, Microsoft’s globally distributed, horizontally partitioned, multi-model database service, it’s useful to use the local emulator. At the moment the Web Interface of the data explorer is oriented for the SQL API and it sounds impossible to create a graph query on the emulator. This is false, the UI is not aware of these features but the engine supporting the Graph API.

You can access the emulator engine by the means of the .Net librairies. Create a new console project in Visual Studio and add the package Microsoft.Azure.Graphs. This package is in preview, you’ll need to specify to the package manager that you’re accepting the previews.

Install-Package Microsoft.Azure.Graphs -pre

This package will come with many friends, notably the reference to Microsoft.Azure.Documents.Client, the core library for everything related to Azure Cosmos DB and the reference to Newtonsoft.Json for everything related to Json serialization.

cosmosdb-console-001

For the Newtonsoft.Json library, you have some conflicts of version in the bunch of dependencies. These issues are fixed by applying some binding redirection in the app.config file.

cosmosdb-console-002

To create the database, you won’t need to use methods specific to the Graph API. A database is agnostic of the API that you’ll use for querying. If the Azure Portal is asking you to specify an API when creating a database online, it’s just to display a UI that would be easier to use with the selected API, nothing else.

The endpoint and authkey for the emulator are usually set to:

var endpoint= new Uri("https://localhost:8081");
var authkey = "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";

the method CreateDatabaseIfNotExistsAsync is useful for creating a database and skipping this step if the database is already existing.

using (var client = new DocumentClient(endpoint, authKey))
{
  var databaseResponse = client.CreateDatabaseIfNotExistsAsync(
      new Database() { Id = databaseId }
).Result;
}

It could be interesting to check the status code of this request. If the status code is OK, it means that the database was already existing. For a newly created database the status code would be Created.

switch (databaseResponse.StatusCode)
{
  case System.Net.HttpStatusCode.OK:
    Console.WriteLine($"Database {databaseId} already exists.");
    break;
  case System.Net.HttpStatusCode.Created:
    Console.WriteLine($"Database {databaseId} created.");
    break;
  default:
    throw new ArgumentException(
        $"Can't create database {databaseId}: {databaseResponse.StatusCode}"
    );
}

Once the database is created, you’ll need to create the collection. Same remark, a collection is not dependent of a specific API.

var databaseUri = UriFactory.CreateDatabaseUri(databaseId);
var collectionResponse = client.CreateDocumentCollectionIfNotExistsAsync(
     databaseUri
     , new DocumentCollection() { Id = collectionId }
).Result; 
switch (collectionResponse.StatusCode)
{
    case System.Net.HttpStatusCode.OK:
        Console.WriteLine($"Collection {collectionId} already exists.");
        break;
    case System.Net.HttpStatusCode.Created:
        Console.WriteLine($"Collection {collectionId} created.");
        break;
    default:
        throw new ArgumentException(
           $"Can't create collection {collectionId}:
           {collectionResponse.StatusCode}"
        );
}

You can check in the Web UI of the emulator and you’ll see the newly created database and collection. If it doesn’t appear, try to open the page in another tab.

Now that the database and the collection are existing, we can start to query the graph. At this step, I’ll need to use some specific methods to the Graph API. The most notable method is the CreateGremlinQuery extension. This method accepts two parameters: the collection to query and the statement.

var queryCount = client.CreateGremlinQuery<dynamic>(collectionResponse,
     "g.V().Count()");

This will just instantiate a query, not executing it. To execute the query, you’ll need to call the method ExecuteNextAsync. This method will return the first set of results. In this case, we know that we’re expecting exactly one result and can go with this shortcut.

var count = queryCount.ExecuteNextAsync().Result.First();

With the help of the method CreateGremlinQuery, you can also create some statements to create nodes or edges. The node or edge will be created when the method ExecuteNextAsync is executed.

var queryNode = client.CreateGremlinQuery<dynamic>(collectionResponse,
    "g.addV('person').property('id', 'thomas').property
    ('firstName', 'Thomas').property('age', 44)");
query.ExecuteNextAsync().Result;
Advertisements

Counting items with the SQL API on Azure Cosmos DB

Azure Cosmos DB is Microsoft’s globally distributed, horizontally partitioned, multi-model database service. The terms “Multi-model database service” mean that you can seamlessly access the content of your database from different APIs. One of these API is named SQL and is a kind of mix between JSON and SQL. If you’re used to ANSI-SQL, it’s probably the easiest API to start with.

This SQL API supports the keyword COUNT but it’s not necessary the keyword that will be the more useful when you want to know how many elements are available in an array.

Before going further, to illustrate this article, we’ll need to setup a database and a collection (named families), you can achieve this with the UI of the portal. Then load a few items in your collection. To achieve this task, click on “New document” on the tab documents of your collection, paste the json corresponding to the Andersen family from this page, click on “Save”. Then click again on “New document” and paste the content of the json related to the Wakefield family from the same page. Don’t forget to click on “Save”. Your two families are now part of the database’s content and we’ll be able to query them.

The first example is about how counting the children by family. In ANSI-SQL, the approach would probably be to select the familyId then perform a count distinct child grouping by familyId. With the SQL API of CosmosDB, you’ll need to take another approach and use the keyword ARRAY_LENGTH. This keyword returns the number of elements of the specified array expression. In our case the array expression is just the array containing the children of a family. Our select will become:

SELECT
f.id AS FamilyId,
ARRAY_LENGTH(f.children) AS ChildrenCount
FROM
families f

The result is effectively a list of FamilyId and ChidrenCountcosmosdb-count-001

We can do something a bit more complex and check how many pets have each child. To succeed we’ll to start from the children level and then take the length of the sub-array pets. This can be done, using the keywords JOIN and IN to force an iteration on existing arrays. In this case we set the set the granularity of the query to be a child and for each of them we’re retrieving her familyId, givenName and count the pets.

SELECT
f.id as FamilyId,
c.givenName,
ARRAY_LENGTH(c.pets) as PetsCount
FROM
families f
JOIN c IN f.children

If you want to improve the rendering of the result, you can use the coalesce operator ??. If a value is not returned by the query then the result can be replaced by another value. In this case, if the givenName is not returned we’ll use the firstName and if the array of pets is not existing then we’ll set the count of pets to zero.

SELECT
f.id as FamilyId,
c.givenName ?? c.firstName,
ARRAY_LENGTH(c.pets) ?? 0 as PetsCount
FROM
families f
JOIN c IN f.children

cosmosdb-count-002

Last question to answer, how many pets have we got in our database? To answer this kind of question, you’ll need to perform an aggregation. The idea will be to know how many pets do each child have and then sum the result that we obtained for each child. Translated in SQL:

SELECT
SUM(ARRAY_LENGTH(c.pets)) as PetsCount
FROM
families f
JOIN c IN f.children

cosmosdb-count-003

Another way to achieve the same result is to change the granularity of the query and start from the pets. In this case, we’ll need to count distinct pets and we can finally use the keyword COUNT.

SELECT
COUNT(p) as PetsCount
FROM
families f
JOIN c IN f.children
JOIN p IN c.pets

cosmosdb-count-003

Deploying the Azure Cosmos DB emulator when building with AppVeyor

Azure Cosmos DB is Microsoft’s globally distributed, horizontally partitioned, multi-model database service. AppVeyor is a Continuous Integration solution for Windows. The key question is how can I test my solution, built on appveyor, with an Azure Cosmos DB back-end? Before going further, you need to know that Azure Cosmos DB has an emulator that you can run on any Windows machine. You can use to develop but also to test your solution. I’d not recommend load tests but unit or integration tests are supported.

The first step is to download the emulator (only 26MB) on your Appveyor build machine. You can do this during the install step by the help of this one line script in PowerShell:

Start-FileDownload 'https://aka.ms/cosmosdb-emulator' -FileName 'c:\projects\cosmos-db.msi'

When the file is downloaded, you will need to execute it to install the emulator on the machine. Using a few parameters of the installer this can be acheived without the need to click on buttons.

cmd /c start /wait msiexec /i "c:\projects\cosmos-db.msi" /qn /quiet /norestart /log install.log

Then, once the software is installed, you’ll need to run it. As it will run in background you won’t need UI or Data Explorer. These options can be turned off by the means of the parameters /NoUI and /No-Explorer.

"C:\Program Files\Azure Cosmos DB Emulator\CosmosDB.Emulator.exe"  /NoExplorer /NoUI

To avoid to run it interactively in the build execution context, you’ll need to start it in another process. It can be done with PowerShell Start-Process

Finally, you cannot expect that this kind of software is starting in a milli-second. You’ll have to pool and check if a connection is possible or not. that’s the goal of this PowerShell script:

$attempt = 0
$client = New-Object System.Net.Sockets.TcpClient([System.Net.Sockets.AddressFamily]::InterNetwork)

do {
	$attempt++
	try {    
		$client.Connect("127.0.0.1", 8081)
		write-host "CosmosDB started"
	}
	catch {
		$client.Close()
		if($attempt -eq 5) {
			write-host "CosmosDB was not started"
		} else {
			[int]$sleepTime = 3*$attempt
			write-host "CosmosDB is not started. Retry after $sleepTime seconds..."
			sleep $sleepTime;
		}
	}
}while(!$client.Connected -and $attempt -lt $max)

If the emulator has not started after 15 seconds, something is going wrong and you should step out of this loop.

The full listing with all actions taken during the install step of your build process is visible there:

appveyor-cosmosdb-emulator

If you’ve troubles, it can be useful to check if the port 8081 is listening: this can be easily done with the help of the command netstat -ab that you can add as a fifth action in your install step.

Automating the testing of an Azure Cosmos DB instance with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. To give a really good start to the year 2018, NBi is now supporting Azure Cosmos DB databases using the Graph API (gremlin queries)!

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing Azure Cosmos DB instances, you must download the version 1.18.0-RC1 of NBi and its extension for Azure Cosmos DB in version 1.0.0-RC1. Unzip the content of the NBi download and then partially override it with the content of the extension (in the RC1 all the dlls should be in the same directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. When done, create your test-suite, config and project file as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it).

For this example, I created a database on Azure Cosmos DB (named Friends) containing a graph (named FoF). I created some vertexes and edges with:

g.V().Drop()
g.addV('person').property('id', 'thomas').property('firstName', 'Thomas').property('age', 44)
g.addV('person').property('id', 'mary').property('firstName', 'Mary').property('lastName', 'Andersen').property('age', 39)
g.addV('person').property('id', 'ben').property('firstName', 'Ben').property('lastName', 'Miller')
g.addV('person').property('id', 'robin').property('firstName', 'Robin').property('lastName', 'Wakefield')
g.V('thomas').addE('knows').to(g.V('mary'))
g.V('thomas').addE('knows').to(g.V('ben'))
g.V('ben').addE('knows').to(g.V('robin'))

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.CosmosDB.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.CosmosDb"/>
    </extensions>
  </nbi>
</configuration>

You also need to edit your config file. Add all the binding redirections that the Azure Cosmos DB is coming with. To achieve this, copy/paste the content of the runtime element of NBi.Core.CosmosDb.dll.config into a new runtime element in the file FoF.config.

The concept of connection-string is not familiar to Azure Cosmos DB but to match with relational databases, we use a concatenation of the usual parameters defined to connect to an Azure Cosmos DB instance. You must provide the endpoint, the authkey, the database Id and the collection Id, concatenated into named-tokens separated by semi-colons. For the collection Id, you must use the name graph and not collection, in prevision of NBi’s compatibility with other APIs supported by Cosmos DB.

Endpoint=https://gremlin-demo.documents.azure.com:443;AuthKey=F...==;database=Friends;graph=FoF

In your test-suite, querying a relational database or your Azure Cosmos DB database with the Graph API is not different. You’re still using the result-set and query elements. The gremlin query is specified within the query. It doesn’t support parameters at the moment but you’re still free to use template-tokens.

Graph queries can return complex results such as paths that are not easily mapped to a table (result-set). Currently, NBi supports for Cosmos DB is limited to queries returning a list of Vertexes or a list of Edges or a list of values.

In this first test, NBi is asserting that the query to return Thomas’ friends of friends is effectively returning the correct result. In this case a unique vertex representing a person named Robin.

FoF-Thomas-Robin
<test name="Thomas' friends of friends" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        g.V('thomas').outE('knows').inV().hasLabel('person').outE('knows').inV().hasLabel('person')
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo values-default-type="text">
      <resultSet>
        <row>
          <cell>robin</cell>
          <cell>person</cell>
          <cell>vertex</cell>
          <cell>Robin</cell>
          <cell>Wakefield</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>

In this second test, we’re asserting that the all the vertexes contain a family name correctly capitalized:

<test name="All of them are older than 20" uid="0002">
  <system-under-test>
    <resultSet>
      <query>
        g.V()
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <all-rows>
      <predicate operand="firstName" type="text">
        <matches-regex>^([A-Z][a-z]+)$</matches-regex>
      </predicate>
    </all-rows>
  </assert>
</test>

And finally, in this last test, we’re checking that the result-set returned by a projection matches the expected result.

<test name="Count of person's relationships" uid="0003">
  <system-under-test>
    <resultSet>
      <query>
        g.V().project('FirstName','KnowsCount').by('firstName').by(out().Count())
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo>
      <resultSet>
        <row>
          <cell>Mary</cell>
          <cell>0</cell>
        </row>
        <row>
          <cell>Robin</cell>
          <cell>0</cell>
        </row>
        <row>
          <cell>Ben</cell>
          <cell>1</cell>
        </row>
        <row>
          <cell>Thomas</cell>
          <cell>2</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>
FoF-running-test-suite

Our test-suite is green … time to write new tests!

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter
or on Github. The code of this extension is also available on Github.

Encoding basic authentication in an url with Power Query M language

More than one year ago, I edited a blog post about how to get data from Neo4j with Power BI and yesterday, Chris Skardon published a cover enriched with more details (useful if you’re new to Power BI). But Chris did more than a cover and also changed the way to authenticate. In my version, I circumnavigated the authentication obstacle by adjusting the server settings to let me run anonymous queries but Chris overcame it by providing a header to the web request (See bellow).

let
  Source = Web.Contents( "http://localhost:7474/db/data/transaction/commit",
    [
      Headers = [#"Authorization" = "Basic bmVvNGo6bmVv"],
      Content=Text.ToBinary("{
        ""statements"" : [ {
          ""statement"" : ""...""} ]
        }")]
      )
    ]
in
  Source

Chris explained that he encoded his user/password combination with the help of C# or PowerShell. Then he added this encoding string in the header for the Authorization tag, after the basic keyword.

This is a good idea but I thought it was also a good idea to explain how to execute this base64-encoding without relying on an external process … just with Power Query M Language!

The first step is to correctly format the string to encode. The expected format is user:password. I can create two parameters User and Password and then creating a new query named Credentials and apply the format.

let
    Source = User & ":" & Password,

After this you need to convert your text to bytes by the help of the function Text.ToBinary

    Bytes = Text.ToBinary(Source),

When you have the bytes, you need to encode them to a string with the help of the Base64 binary encoding.

    TextForm = Binary.ToText(Bytes, BinaryEncoding.Base64)
in
    TextForm

You can now edit the main query and replace Headers = [#"Authorization" = "Basic bmVvNGo6bmVv"], by Headers = [#"Authorization" = "Basic " &amp; Credentials],.

That’s it, the encoding is performed in Power Query M language.

Full code:

let
    Source = User & ":" & Password,
    Bytes = Text.ToBinary(Source),
    TextForm = Binary.ToText(Bytes, BinaryEncoding.Base64)
in
    TextForm

Running a native query within an M project

A few weeks ago, Microsoft published the Power Query SDK, an extension of Visual Studio, on the Visual Studio Marketplace. It is available for free for VS2015 and VS2017. This extension is a really welcome in the world of “Power Query M language” aficionados. By using it, you can save a lot of time when trying to debug a query or when you want to manually test your implementation. Naturally, it’s just a version 1.0.1 and there are rooms for improvements (having more than one file in a .mproj, support for debugging, …) but it’s much better than relying on the “advanced editor” of Power BI (the name “advanced editor” is somewhere fun for just a textbox).

I was just using it since a few days when I found an interesting case. My query had a native query

Sql.Database(
   "server",
   "db",
   [Query = "select * from myTable where field=" & value]
)

When I tried to execute it, I received a message from the Power Query SDK that

The evaluation requires a permission that has not been provided. Data source kind: ‘SQL’. Permission kind: ‘NativeQuery’.

evaluation-requires-permission-not-provided

No luck with my google-fu, so I checked the github repository linked to data connectors and documenting this SDK. Nothing was available in the documentation but an attribute in the project definition caught my attention: AllowNativeQuery. This attribute is by default at false, I changed it to true

<AllowNativeQuery>True</AllowNativeQuery>

I reloaded the project and no surprise: it was where the magic happened. I can now run my native queries within Visual Studio for this project!

Sorting a table based on your own criteria with Power Query (M language)

Earlier, I published a blog about A case insensitive approach to sort a table with Power Query (M language) and I received a few pertinent comments from “Nick”. I agree with most of his comments and some remarks about the syntax were fully valid but the last part of one of his comment was not true (or at least is not true anymore). To quote him:

I wanted to use this method to sort on two fields, I guess I’ll have to go back to converting them to upper-case first, sorting on the upper-case fields and then removing them again

I’m a bit overloaded and I took a very long time to check. But at the end, it’s really possible to define multiple criteria to sort a table.

To illustrate the how-to, I’m starting back from my previous example. I’ll now sort the dataset based the label (column item) without taking into account the case [First criterion – no change] and in case of equality I’ll then sort the rows based on the price (descending) [second criterion to be introduced].

To test this, I’m introducing a few changes in my dataset. I’m adding a tenth record with the item Fishing Rod and updating the price for all the Fishing Rod and Bait items.

let
  Source = Table.FromRecords(
    {
      [OrderID=1, CustomerID=1, Item = "Fishing rod", Price = 100.0],
      [OrderID=2, CustomerID=1, Item = "1 lb. worms", Price = 5.0],
      [OrderID=3, CustomerID=2, Item = "Fishing net", Price = 25.0],
      [OrderID=4, CustomerID=3, Item = "Fish tazer", Price = 200.0],
      [OrderID=5, CustomerID=3, Item = "Bandaids", Price = 2.0],
      [OrderID=6, CustomerID=1, Item = "Tackle box", Price = 20.0],
      [OrderID=7, CustomerID=5, Item = "Bait", Price = 4.25],
      [OrderID=8, CustomerID=5, Item = "Fishing Rod", Price = 120.0],
      [OrderID=9, CustomerID=6, Item = "Bait", Price = 3.25],
      [OrderID=10, CustomerID=5, Item = "Fishing Rod", Price = 80.0]
    }
  ),
#"Changed Type" =
  Table.TransformColumnTypes(
    Source,
    {
      {"Item", type text},
      {"Price", type number}
    }
  ),

The basic principle for sorting based on you own criteria has not changed and you need to specify a list of criterion as the second argument of the Table.Sort function.

#"Sorted Rows" =
  Table.Sort(
    #"Changed Type",
    {
       each Text.Upper([Item]),
       each [Price]*-1
    }
  )

and this method definitely gives the expected result:
sort-table-multiple-criteria

Presenting before the summer ’17

I’ll be really busy presenting before this summer. With four sessions (including three completely new) to deliver in less than 30 days, I’ll have less times for my open-source projects.

busy.jpg

The first session will be at Charleroi (Hainaut, Belgium) on the 2nd of June for the Journée Agile 2017. This is a bit special for me, I never talked so close to my hometown and it will be my first time in Wallonia since 10 years. No one is a prophet in his own country, even if it is a small one! It will also be different because the main topic of the conference is not SQL Server or databases but agility. A concept that I use since 10 years, I’m really not a newcomer. Agility and BI are two concepts with little traction, let’s break the walls! I’ll speak (in French) about testing automation for BI solutions, this is the session that I’m presenting across Europe but I’ll adapt it a bit to the attendance and stress the agility topic (and a bit less time on the demos). If you’re interested by the agile topic (and if you’re not, you should really care about this … the world has changed, hurry up!), a few seats are still available.

The next stop will be at SQL Saturday Rheinland (Sankt Augustin, North Rhine-Westphalia, Germany – University between Bonn and Köln). The session will be about the M Language (the language used for Power Query/Power BI to get and transform data) and how much it’s fun to code with a functional language. It’ll be the first time that I’ll present this session as-is but I already included large parts of this session in some blog posts or show-cases. Not a lot of stress to be ready on time. About M, the main topic of this session, I’m really impressed by the power of this language and I’ll try to share a bit of my knowledge and help you to make the first steps in the right direction. Don’t be confused, it’s not a session about Power Query/Get & Transform, I won’t use the UI … I’ll show you the code and how to resolve more advanced cases that you won’t be able to implement with just the user interface.

SQLSaturday #605 - Rheinland 2017

Then, at the end of the month, I’ll be presenting at the 24 Hours of Pass (French edition – 2017). This year, I accepted to present two sessions (Webcasts in French). Can you really say “no” twice to Isabelle and Jean-Pierre? The first session is about going high in normal forms for data warehouses. Don’t panic, I won’t be boring explaining in theory what’s a normal form. I just want to explain why Data Vault, Anchor Modeling (and Tibre Modeling, my own “methodology” to model time-based relations in a data warehouse) are really powerful approaches for some specific cases. I’ll also show you why SQL Server is suited to host and serve this kind of model. I’m really excited about this session, it’ll be a worldwide première and I hope to have the opportunity to present it again later, let’s cross-fingers.

The second session at the 24 HOP French Edition is about Graph processing with SQL Server 2017 (aka SQL Graph). That’s now several years that I’m following Neo4j and I appreciate how simple it’s to resolve some use-cases with Cypher and this engine. I still need to develop large parts of my session but the plan is surely not to make an in-depth comparison of the two engines. I’m expecting to explain why a Graph processing is better suited for some specific cases and show the difference in terms of expressiveness and performance with a standard SQL approach.

A case insensitive approach to sort a table with Power Query (M language)

Have you already tried to sort a table based on a text field? The result is usually a surprise for most people. M language has a specific implementation of the sort engine for text where upper case letters are always ordered before lower case letters. It means that Z is always before a. In the example (here under), Fishing Rod is sorted before Fishing net.

sort-001

The classical trick to escape from this weird behavior is to create a new column containing the upper case version of the text that will be used to sort your table, then configure the sort operation on this newly created column. This is a two steps approach (Three steps, if you take into account the need to remove the new column). Nothing bad with this except that it obfuscates the code and I hate that.

sort-002

The documentation of Table.Sort is really sparse, especially the last parameter comparisonCriteria. I never found any documentation on this parameter and most of the available examples just use the name of one of the fields of the table. Hereby, I’ll give you a trick to apply a case insensitive sort for a table in one step.

The most important thing to understand is that comparisonCriteria doesn’t need to be a column, it can be any function. Based on this, it’s an safe move to change the code {"columnSort"} into {each Text.Upper(_[columnSort])} … and that’s it!

sort-003
let
  Source = Table.FromRecords(  
    {  
      [OrderID=1, CustomerID=1, Item = "Fishing rod", Price = 100.0],  
      [OrderID=2, CustomerID=1, Item = "1 lb. worms", Price = 5.0],  
      [OrderID=3, CustomerID=2, Item = "Fishing net", Price = 25.0],  
      [OrderID=4, CustomerID=3, Item = "Fish tazer", Price = 200.0],  
      [OrderID=5, CustomerID=3, Item = "Bandaids", Price = 2.0],  
      [OrderID=6, CustomerID=1, Item = "Tackle box", Price = 20.0],  
      [OrderID=7, CustomerID=5, Item = "Bait", Price = 3.25],  
      [OrderID=8, CustomerID=5, Item = "Fishing Rod", Price = 100.0],  
      [OrderID=9, CustomerID=6, Item = "Bait", Price = 3.25]  
    }  
  ),
  #"Changed Type" = 
    Table.TransformColumnTypes(Source,{{"Item", type text}}),
  #"Sorted Rows" = 
    Table.Sort(#"Changed Type",{each Text.Upper([Item])})
in
  #"Sorted Rows"

Auditing login attempts with SQL Server

Today’s quest is similar to the blog post that I wrote two months ago about auditing security changes: we’re admin of our non-production environnements of SQL Server and we’d like to put in place some auditing. In this case we’d like to store all the login attempts on our database/server.

The core issue is that we have many users and logins on our databases and we have huge doubt their respective needs. The root cause is identified: sometimes, for a short period of time, we’re making exceptions to our own rules and let a few other friend projects access to our DEV database. On some other cases, we’re connecting our own solution in DEV environnement to the QA environnement of another solution. Why … planning, data quality issue, … we’ve valid reasons to do it … but these exceptions should be removed as soon as possible. And you know what? People forget. Nowadays, on our largest solution, we have 20 users but only 7 of them are expected and documented … other should be removed. But before executing this cleanup, we’d like to be sure that these users are not effectively used by other solutions. If it’s the case, we’ll need to update first the configuration of the corresponding solution.

To know if a user is effectively in use on a database, you need to capture the attempts of connexion (success and failure). Technically, you’ve several ways to implement this in SQL Server but I decided to use the audit feature to accomplish this task.

who-is-attempting-to-log-to-your-database

The first step is always to create an audit on the server or database (here, I’d like to track the whole server).

create server audit
    [Audit Login Informations]
to file
(   FILEPATH = N'N:\Temp\Trace\'
    , MAXSIZE = 1024 MB
    , MAX_FILES = 10
    , RESERVE_DISK_SPACE = OFF
)
with
(   QUEUE_DELAY = 1000
    , ON_FAILURE = CONTINUE
    , AUDIT_GUID = '33e055a9-1924-47c3-9777-e8875a5bcd37'
);

This audit must be activated

alter server audit
    [Audit Login Informations]
with (state=on);
go

Then the specification of this audit must be described. In our case we’ll intercept five groups of events related to login attempts (successful or not) and to contained database or not.

create server audit specification
    [Audit Login Informations specification]
for server audit [Audit Login Informations]
    add (FAILED_DATABASE_AUTHENTICATION_GROUP)
    , add (FAILED_LOGIN_GROUP)
    , add (SUCCESSFUL_DATABASE_AUTHENTICATION_GROUP)
    , add (SUCCESSFUL_LOGIN_GROUP)
    , add (DATABASE_LOGOUT_GROUP)
with (state=on);
go

By experience this kind of script is probably more useful and less breakable, if you add a cleanup phase before creating your audit and its specification.

Let’s start by testing if the audit is already existing. The view sys.dm_server_audit_status will let you check if the audit is existing and if it’s running. If it’s the case then you need to stop it and drop it (and its specification).

if exists(
    select
         *
    from
         sys.dm_server_audit_status
    where
         name='Audit Login Informations'
)
begin
    alter server audit specification
        [Audit Login Informations specification]
    with (state=off);

    drop server audit specification
        [Audit Login Informations specification];

    alter server audit
        [Audit Login Informations]
    with (state=off);

    drop server audit
        [Audit Login Informations];
end

That’s it! You can now query your audit to get all the connection attempts on your server by the means of this simple query:

declare  @LogFilePath varchar(255);
set @LogFilePath = N'N:\TEMP\Trace\';

declare  @AuditName varchar(255);
set @AuditName = N'Audit Login Informations';

select
    @LogFilePath=log_file_path
from
    sys.server_file_audits
where
    name=@AuditName;

set @LogFilePath=@LogFilePath
    + REPLACE(@AuditName, ' ', '%5')
    + '*.sqlaudit';

select
    session_server_principal_name
    ,database_name
    ,object_name
    ,m.class_type_desc
    ,statement
    ,convert(datetime,
        switchoffset(
            convert(datetimeoffset, event_time)
            , datename(TzOffset, sysdatetimeoffset())
        )
    )
from
    sys.fn_get_audit_file (@LogFilePath,default,default) f
inner join
    sys.dm_audit_class_type_map m
    on m.class_type=f.class_type
order by
    event_time desc;
Illustration performed with the help of icons from the noun project: Database by Dmitry Mirolyubov, Sightview by Andy Ivandikov, Lock by Maxim Kulikov. Common creative licence.