Azure Cosmos DB

Automating the testing of gremlin queries with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. In 2018, I’m adding support for graph databases. Earlier this year, I released an extension for the Neo4j bolt driver supporting Cypher queries and another compatible with some of the Azure Cosmos DB API and notably the Graph API. This new extension is supporting all the Apache Tinkerpop-enabled databases and is tested on Azure Cosmos Db and Tinkerpop Server but should also work other graph databases supporting gremlin such as JanusGraph, Titan and OrientDB. The code for this extension is hosted on GitHub.Note that Azure Cosmos DB is now recommending to use the Gremlin driver in place of the Graph API dlls, so this extension should also be your first target if you want to connect to Azure Cosmos DB database and you want to run gremlin queries.

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some gremlin queries, you must download the version 1.18 of NBi and its extension for Gremlin in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.Gremlin (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project file as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I started a Tinkerpop Server and created some vertexes and edges with:

g.V().drop()
g.addV('person').property('id', 'thomas')
  .property('firstName', 'Thomas').property('age', 44)
g.addV('person').property('id', 'mary')
  .property('firstName', 'Mary')
  .property('lastName', 'Andersen')
  .property('age', 39)
g.addV('person').property('id', 'ben')
  .property('firstName', 'Ben')
  .property('lastName', 'Miller')
g.addV('person').property('id', 'robin')
.property('firstName', 'Robin')
  .property('lastName', 'Wakefield')
g.V().has('firstName','Thomas').addE('knows')
  .to(g.V().has('firstName','Mary'))
g.V().has('firstName','Thomas').addE('knows')
  .to(g.V().has('firstName','Ben'))
g.V().has('firstName','Ben').addE('knows')
  .to(g.V().has('firstName','Robin'))

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.Gremlin.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.Gremlin"/>
    </extensions>
  </nbi>
</configuration>

You also need to edit your config file. Add all the binding redirections that the Azure Cosmos DB is coming with. To achieve this, copy/paste the content of the runtime element of NBi.Core.CosmosDb.dll.config into a new runtime element in the config file of your test-suite FoF.config.

<runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="System.Diagnostics.DiagnosticSource" publicKeyToken="cc7b13ffcd2ddd51" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.0.1.0" newVersion="4.0.1.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Net.Http" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.2.0.0" newVersion="4.2.0.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Net.WebSockets" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.0.2.0" newVersion="4.0.1.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Net.WebSockets.Client" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.0.2.0" newVersion="4.0.1.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-11.0.0.0" newVersion="11.0.0.0" />
      </dependentAssembly>
    </assemblyBinding>
</runtime>

The concept of connection-string is not familiar to Tinkerpop or Gremlin but to match with relational databases, we use a concatenation of the usual parameters defined to connect to an instance supporting gremlin queries. You must provide the hostname, the port, the enableSsl, the username and password. You must also add a tip for NBi to identify the correct type of connection-string by adding an api tag with the value gremlin.If your database is a Azure Cosmos DB instance, then your username will be /dbs/{db}/colls/{coll} where {db} stands for the name of the database and {coll} for the name of the collection. Also, note that the hostname is a url part of the subdomain .gremlin.cosmosdb.azure.com.

hostname=https://gremlin-demo.gremlin.azure.com;
port=443;
enableSsl=true;
username=database=/dbs/Friends/colls/FoF;
password=F...==;
api=gremlin;

The same connection-string for a Tinkerpop Server, running on your computer, will be

hostname=localhost;
port=8182;
enableSsl=false;
username=database=Anything;
password=Anything;
api=gremlin;

Note that username and password are not expected by a Tinerpop Server. Anyway, they can’t be omited but any value will make it.

In your test-suite, querying a relational database with SQL or a Tinkerpop-enabled database with a gremlin query is not different. You’re still using the result-set and query elements. The gremlin query is specified within the query. It doesn’t support parameters at the moment but you’re still free to use template-tokens.

Gremlin queries can return complex results such as paths that are not easily mapped to a table (result-set). Currently, NBi support for Gremlin is limited to queries returning a list of Vertexes or a list of Edges or a list of values.

In this first test, NBi is asserting that the query to return Thomas’ friends of friends is effectively returning the correct result. In this case a unique vertex representing a person named Robin.

FoF-Thomas-Robin
<test name="Thomas' friends of friends" uid="0001">
    <system-under-test>
      <resultSet>
        <query>
          g.V().has('firstName', 'Thomas').outE('knows').inV().hasLabel('person').outE('knows').inV().hasLabel('person')
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <equalTo values-default-type="text">
        <resultSet>
          <row>
            <cell>robin</cell>
            <cell>person</cell>
            <cell>vertex</cell>
            <cell>Robin</cell>
            <cell>Wakefield</cell>
          </row>
        </resultSet>
      </equalTo>
    </assert>
  </test>

In this second test, we’re asserting that the all the vertexes contain a property age greater than 20.

<test name="All of them are older than 20 (or unknown)" uid="0002">
    <system-under-test>
      <resultSet>
        <query>
          g.V()
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <all-rows>
        <combination operator="or">
          <predicate operand="age">
            <null/>
          </predicate>
          <predicate operand="age">
            <more-than>20</more-than>
          </predicate>
        </combination>
      </all-rows>
    </assert>
  </test>

And finally, in this last test, we’re checking that the result-set returned by a projection& matches the expected result.

  <test name="Count of person's relationships" uid="0003">
    <system-under-test>
      <resultSet>
        <query>
          g.V().project('FirstName','KnowsCount').by('firstName').by(out().count())
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <equalTo>
        <resultSet>
          <row>
            <cell>Mary</cell>
            <cell>0</cell>
          </row>
          <row>
            <cell>Robin</cell>
            <cell>0</cell>
          </row>
          <row>
            <cell>Ben</cell>
            <cell>1</cell>
          </row>
          <row>
            <cell>Thomas</cell>
            <cell>2</cell>
          </row>
        </resultSet>
      </equalTo>
    </assert>
  </test>
FoF-running-test-suite

Our test-suite is green … time to write some additional tests!

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter or on the Github repository (section “issues”). The code of this extension is also available on Github.

Advertisements

Creating and querying a database using the Graph API on the Azure Cosmos DB emulator

When developing on Azure Cosmos DB, Microsoft’s globally distributed, horizontally partitioned, multi-model database service, it’s useful to use the local emulator. At the moment the Web Interface of the data explorer is oriented for the SQL API and it sounds impossible to create a graph query on the emulator. This is false, the UI is not aware of these features but the engine supporting the Graph API.

You can access the emulator engine by the means of the .Net librairies. Create a new console project in Visual Studio and add the package Microsoft.Azure.Graphs. This package is in preview, you’ll need to specify to the package manager that you’re accepting the previews.

Install-Package Microsoft.Azure.Graphs -pre

This package will come with many friends, notably the reference to Microsoft.Azure.Documents.Client, the core library for everything related to Azure Cosmos DB and the reference to Newtonsoft.Json for everything related to Json serialization.

cosmosdb-console-001

For the Newtonsoft.Json library, you have some conflicts of version in the bunch of dependencies. These issues are fixed by applying some binding redirection in the app.config file.

cosmosdb-console-002

To create the database, you won’t need to use methods specific to the Graph API. A database is agnostic of the API that you’ll use for querying. If the Azure Portal is asking you to specify an API when creating a database online, it’s just to display a UI that would be easier to use with the selected API, nothing else.

The endpoint and authkey for the emulator are usually set to:

var endpoint= new Uri("https://localhost:8081");
var authkey = "C2y6yDjf5/R+ob0N8A7Cgv30VRDJIWEHLM+4QDU5DE2nQ9nDuVTqobD4b8mGGyPMbIZnqyMsEcaGQy67XIw/Jw==";

the method CreateDatabaseIfNotExistsAsync is useful for creating a database and skipping this step if the database is already existing.

using (var client = new DocumentClient(endpoint, authKey))
{
  var databaseResponse = client.CreateDatabaseIfNotExistsAsync(
      new Database() { Id = databaseId }
).Result;
}

It could be interesting to check the status code of this request. If the status code is OK, it means that the database was already existing. For a newly created database the status code would be Created.

switch (databaseResponse.StatusCode)
{
  case System.Net.HttpStatusCode.OK:
    Console.WriteLine($"Database {databaseId} already exists.");
    break;
  case System.Net.HttpStatusCode.Created:
    Console.WriteLine($"Database {databaseId} created.");
    break;
  default:
    throw new ArgumentException(
        $"Can't create database {databaseId}: {databaseResponse.StatusCode}"
    );
}

Once the database is created, you’ll need to create the collection. Same remark, a collection is not dependent of a specific API.

var databaseUri = UriFactory.CreateDatabaseUri(databaseId);
var collectionResponse = client.CreateDocumentCollectionIfNotExistsAsync(
     databaseUri
     , new DocumentCollection() { Id = collectionId }
).Result; 
switch (collectionResponse.StatusCode)
{
    case System.Net.HttpStatusCode.OK:
        Console.WriteLine($"Collection {collectionId} already exists.");
        break;
    case System.Net.HttpStatusCode.Created:
        Console.WriteLine($"Collection {collectionId} created.");
        break;
    default:
        throw new ArgumentException(
           $"Can't create collection {collectionId}:
           {collectionResponse.StatusCode}"
        );
}

You can check in the Web UI of the emulator and you’ll see the newly created database and collection. If it doesn’t appear, try to open the page in another tab.

Now that the database and the collection are existing, we can start to query the graph. At this step, I’ll need to use some specific methods to the Graph API. The most notable method is the CreateGremlinQuery extension. This method accepts two parameters: the collection to query and the statement.

var queryCount = client.CreateGremlinQuery<dynamic>(collectionResponse,
     "g.V().Count()");

This will just instantiate a query, not executing it. To execute the query, you’ll need to call the method ExecuteNextAsync. This method will return the first set of results. In this case, we know that we’re expecting exactly one result and can go with this shortcut.

var count = queryCount.ExecuteNextAsync().Result.First();

With the help of the method CreateGremlinQuery, you can also create some statements to create nodes or edges. The node or edge will be created when the method ExecuteNextAsync is executed.

var queryNode = client.CreateGremlinQuery<dynamic>(collectionResponse,
    "g.addV('person').property('id', 'thomas').property
    ('firstName', 'Thomas').property('age', 44)");
query.ExecuteNextAsync().Result;

Counting items with the SQL API on Azure Cosmos DB

Azure Cosmos DB is Microsoft’s globally distributed, horizontally partitioned, multi-model database service. The terms “Multi-model database service” mean that you can seamlessly access the content of your database from different APIs. One of these API is named SQL and is a kind of mix between JSON and SQL. If you’re used to ANSI-SQL, it’s probably the easiest API to start with.

This SQL API supports the keyword COUNT but it’s not necessary the keyword that will be the more useful when you want to know how many elements are available in an array.

Before going further, to illustrate this article, we’ll need to setup a database and a collection (named families), you can achieve this with the UI of the portal. Then load a few items in your collection. To achieve this task, click on “New document” on the tab documents of your collection, paste the json corresponding to the Andersen family from this page, click on “Save”. Then click again on “New document” and paste the content of the json related to the Wakefield family from the same page. Don’t forget to click on “Save”. Your two families are now part of the database’s content and we’ll be able to query them.

The first example is about how counting the children by family. In ANSI-SQL, the approach would probably be to select the familyId then perform a count distinct child grouping by familyId. With the SQL API of CosmosDB, you’ll need to take another approach and use the keyword ARRAY_LENGTH. This keyword returns the number of elements of the specified array expression. In our case the array expression is just the array containing the children of a family. Our select will become:

SELECT
f.id AS FamilyId,
ARRAY_LENGTH(f.children) AS ChildrenCount
FROM
families f

The result is effectively a list of FamilyId and ChidrenCountcosmosdb-count-001

We can do something a bit more complex and check how many pets have each child. To succeed we’ll to start from the children level and then take the length of the sub-array pets. This can be done, using the keywords JOIN and IN to force an iteration on existing arrays. In this case we set the set the granularity of the query to be a child and for each of them we’re retrieving her familyId, givenName and count the pets.

SELECT
f.id as FamilyId,
c.givenName,
ARRAY_LENGTH(c.pets) as PetsCount
FROM
families f
JOIN c IN f.children

If you want to improve the rendering of the result, you can use the coalesce operator ??. If a value is not returned by the query then the result can be replaced by another value. In this case, if the givenName is not returned we’ll use the firstName and if the array of pets is not existing then we’ll set the count of pets to zero.

SELECT
f.id as FamilyId,
c.givenName ?? c.firstName,
ARRAY_LENGTH(c.pets) ?? 0 as PetsCount
FROM
families f
JOIN c IN f.children

cosmosdb-count-002

Last question to answer, how many pets have we got in our database? To answer this kind of question, you’ll need to perform an aggregation. The idea will be to know how many pets do each child have and then sum the result that we obtained for each child. Translated in SQL:

SELECT
SUM(ARRAY_LENGTH(c.pets)) as PetsCount
FROM
families f
JOIN c IN f.children

cosmosdb-count-003

Another way to achieve the same result is to change the granularity of the query and start from the pets. In this case, we’ll need to count distinct pets and we can finally use the keyword COUNT.

SELECT
COUNT(p) as PetsCount
FROM
families f
JOIN c IN f.children
JOIN p IN c.pets

cosmosdb-count-003

Deploying the Azure Cosmos DB emulator when building with AppVeyor

Azure Cosmos DB is Microsoft’s globally distributed, horizontally partitioned, multi-model database service. AppVeyor is a Continuous Integration solution for Windows. The key question is how can I test my solution, built on appveyor, with an Azure Cosmos DB back-end? Before going further, you need to know that Azure Cosmos DB has an emulator that you can run on any Windows machine. You can use to develop but also to test your solution. I’d not recommend load tests but unit or integration tests are supported.

The first step is to download the emulator (only 26MB) on your Appveyor build machine. You can do this during the install step by the help of this one line script in PowerShell:

Start-FileDownload 'https://aka.ms/cosmosdb-emulator' -FileName 'c:\projects\cosmos-db.msi'

When the file is downloaded, you will need to execute it to install the emulator on the machine. Using a few parameters of the installer this can be acheived without the need to click on buttons.

cmd /c start /wait msiexec /i "c:\projects\cosmos-db.msi" /qn /quiet /norestart /log install.log

Then, once the software is installed, you’ll need to run it. As it will run in background you won’t need UI or Data Explorer. These options can be turned off by the means of the parameters /NoUI and /No-Explorer.

"C:\Program Files\Azure Cosmos DB Emulator\CosmosDB.Emulator.exe"  /NoExplorer /NoUI

To avoid to run it interactively in the build execution context, you’ll need to start it in another process. It can be done with PowerShell Start-Process

Finally, you cannot expect that this kind of software is starting in a milli-second. You’ll have to pool and check if a connection is possible or not. that’s the goal of this PowerShell script:

$attempt = 0
$client = New-Object System.Net.Sockets.TcpClient([System.Net.Sockets.AddressFamily]::InterNetwork)

do {
	$attempt++
	try {    
		$client.Connect("127.0.0.1", 8081)
		write-host "CosmosDB started"
	}
	catch {
		$client.Close()
		if($attempt -eq 5) {
			write-host "CosmosDB was not started"
		} else {
			[int]$sleepTime = 3*$attempt
			write-host "CosmosDB is not started. Retry after $sleepTime seconds..."
			sleep $sleepTime;
		}
	}
}while(!$client.Connected -and $attempt -lt $max)

If the emulator has not started after 15 seconds, something is going wrong and you should step out of this loop.

The full listing with all actions taken during the install step of your build process is visible there:

appveyor-cosmosdb-emulator

If you’ve troubles, it can be useful to check if the port 8081 is listening: this can be easily done with the help of the command netstat -ab that you can add as a fifth action in your install step.

Automating the testing of an Azure Cosmos DB instance with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. To give a really good start to the year 2018, NBi is now supporting Azure Cosmos DB databases using the Graph API (gremlin queries)!

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some gremlin queries through the Graph API of Azure Cosmos DB, you must download the version 1.18 of NBi and its extension for Gremlin in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.CosmosDb (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project file as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I created a database on Azure Cosmos DB (named Friends) containing a graph (named FoF). I created some vertexes and edges with:

g.V().Drop()
g.addV('person').property('id', 'thomas')
  .property('firstName', 'Thomas')
  .property('age', 44)
g.addV('person').property('id', 'mary')
  .property('firstName', 'Mary')
  .property('lastName', 'Andersen')
  .property('age', 39)
g.addV('person').property('id', 'ben')
  .property('firstName', 'Ben')
  .property('lastName', 'Miller')
g.addV('person').property('id', 'robin')
  .property('firstName', 'Robin')
  .property('lastName', 'Wakefield')
g.V('thomas').addE('knows').to(g.V('mary'))
g.V('thomas').addE('knows').to(g.V('ben'))
g.V('ben').addE('knows').to(g.V('robin'))

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.CosmosDB.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.CosmosDb"/>
    </extensions>
  </nbi>
</configuration>

You also need to edit your config file. Add all the binding redirections that the Azure Cosmos DB is coming with. To achieve this, copy/paste the content of the runtime element of NBi.Core.CosmosDb.dll.config into a new runtime element in the file FoF.config.

The concept of connection-string is not familiar to Azure Cosmos DB but to match with relational databases, we use a concatenation of the usual parameters defined to connect to an Azure Cosmos DB instance. You must provide the endpoint, the authkey, the database Id and the collection Id, concatenated into named-tokens separated by semi-colons. For the collection Id, you must use the name graph and not collection, in prevision of NBi’s compatibility with other APIs supported by Cosmos DB.

Endpoint=https://gremlin-demo.documents.azure.com:443;AuthKey=F...==;database=Friends;graph=FoF

In your test-suite, querying a relational database or your Azure Cosmos DB database with the Graph API is not different. You’re still using the result-set and query elements. The gremlin query is specified within the query. It doesn’t support parameters at the moment but you’re still free to use template-tokens.

Graph queries can return complex results such as paths that are not easily mapped to a table (result-set). Currently, NBi supports for Cosmos DB is limited to queries returning a list of Vertexes or a list of Edges or a list of values.

In this first test, NBi is asserting that the query to return Thomas’ friends of friends is effectively returning the correct result. In this case a unique vertex representing a person named Robin.

FoF-Thomas-Robin
<test name="Thomas' friends of friends" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        g.V('thomas').outE('knows').inV().hasLabel('person').outE('knows').inV().hasLabel('person')
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo values-default-type="text">
      <resultSet>
        <row>
          <cell>robin</cell>
          <cell>person</cell>
          <cell>vertex</cell>
          <cell>Robin</cell>
          <cell>Wakefield</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>

In this second test, we’re asserting that the all the vertexes contain a family name correctly capitalized:

<test name="All of them are older than 20" uid="0002">
  <system-under-test>
    <resultSet>
      <query>
        g.V()
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <all-rows>
      <predicate operand="firstName" type="text">
        <matches-regex>^([A-Z][a-z]+)$</matches-regex>
      </predicate>
    </all-rows>
  </assert>
</test>

And finally, in this last test, we’re checking that the result-set returned by a projection matches the expected result.

<test name="Count of person's relationships" uid="0003">
  <system-under-test>
    <resultSet>
      <query>
        g.V().project('FirstName','KnowsCount').by('firstName').by(out().Count())
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo>
      <resultSet>
        <row>
          <cell>Mary</cell>
          <cell>0</cell>
        </row>
        <row>
          <cell>Robin</cell>
          <cell>0</cell>
        </row>
        <row>
          <cell>Ben</cell>
          <cell>1</cell>
        </row>
        <row>
          <cell>Thomas</cell>
          <cell>2</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>
FoF-running-test-suite

Our test-suite is green … time to write new tests!

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter
or on Github. The code of this extension is also available on Github.