NBi

Using NBi to assert calculations based on measures

A few days ago on GitHub, I received a request to explain how to assert the results of some calculations with measures and also how to automate these tests with NBi. This blog post is the complete answer to this question.

To illustrate the case, I’ll use the Adventure Works 2012 multidimensional cube, but you can do the same with any other database. In this cube, you’ve a measure-group with the measures: Internet Gross Profit, Internet Sales Amount, Internet Gross Profit Margin. As you can guess, the last measure is the result of the division of the two firsts (expressed in percentage). My test will assert that this is still the case when I’m slicing by customers’ country.

To achieve this, I’m defining a system-under-test with a result-set defined in a MDX query:

<system-under-test>
  <resultSet>
    <query>
      <![CDATA[
      select
      {
        [Measures].[Internet Gross Profit],
        [Measures].[Internet Sales Amount],
        [Measures].[Internet Gross Profit Margin]   
      } on 0,
      {
        [Customer].[Country].members
      } on 1
      from
        [Adventure Works]
      ]]>
    </query>
  </resultSet>
</system-under-test>

Asserting the calculations’ result can be executed by a all-rows and a predicate where you’ll check that the content of the last measure is equal to the quotient of the two firsts.

Due to the complex names of columns for MDX queries, it’s recommended to use aliases. With NBi, we can define an alias based on columns’ position (starting at 0). The position is 0 will be for the column [Customer].[Country].members, we can skip it. But columns 1 to 3 will receive aliases:

<alias column-index="1">Profit</alias>
<alias column-index="2">Sales</alias>
<alias column-index="3">Margin</alias>

The test will need to compute the quotient between Profit and Sales and check that this value is equal to the value returned by the cube in the column Margin. This can be expressed in the following expression:

<expression name="calculate">
    Round(Profit/Sales, 4) = Round(Margin, 4)
</expression>

The Round function is provided by the underlying library for the evaluation of these expressions: NCalc. In this case, it’s extremely useful to avoid rounding issues during computations. Due to the percentage, I didn’t round to 2 digits after the decimal separator but to 4.

Based on the expression defined above, we can use the predicate true. Each row not validating the equality defined above will return false and not validate the predicate. The test is a all-rows so if any row is not validating the predicate then the test will fail.

Some developers will perhaps try to use the predicate equal in place of true, but unfortunately, at this moment, NBi can only handle fixed value for the equal predicate and it’s not possible to specify a column This limitation is invalidating the approach with equal.

The full code for this test is

<test name="Profit, Sales and Margin">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
      select
      {
          [Measures].[Internet Gross Profit],
          [Measures].[Internet Sales Amount],
          [Measures].[Internet Gross Profit Margin] 
      } on 0,
      {
          [Customer].[Country].members
      } on 1
      from
          [Adventure Works]
      ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <all-rows>
      <alias column-index="1">Profit</alias>
      <alias column-index="2">Sales</alias>
      <alias column-index="3">Margin</alias>
      <expression name="calculate">
          Round(Profit/Sales, 4) = Round(Margin, 4)
      </expression>
      <predicate operand="calculate" type="boolean">
        <true />
      </predicate>
    </all-rows>
  </assert>
</test>

The second part of this blog post is about the automation of generation of this kind of tests. Let’s start by defining our test-cases. To define them, I’ll put the aliases that I’d like to use (column alias), the expected formulae (column expression), the exact name of the measure (column measure) and how many digits are needed for rounded comparisons (column round). In this example, I created two test-cases. The first test-case will check that Profit is equal to Sales minus Cost and the last one is asserting that Margin is equal to Profit divided by Sales.

alias;expression;measure;round
Sales,Cost,Profit;Sales-Cost;[Measures].[Internet Sales Amount],[Measures].[Internet Total Product Cost],[Measures].[Internet Gross Profit];2
Profit,Sales,Margin;Profit/Sales;[Measures].[Internet Gross Profit],[Measures].[Internet Sales Amount],[Measures].[Internet Gross Profit Margin];4

To load this CSV file in genbiL, I’ll need to use the load file function:

case load file 'calculations.csv';

Note that in this CSV file each column is separated by a semi-column (;). In the columns alias and measure, I’ve more than one value and each of these values are separated by a coma (,).

To parse this CSV file with genbiL, I’ll need to use the function split. This function is transforming a single value into an array of values using a separator.

case split column 'alias' with value ',';
case split column 'measure' with value ',';

After these two lines of code, the test-cases’ scope consists in two cases and each of them has an array of three elements in the columns alias and measure.

Next steps are straightforward: loading the template (that we’ll compose just after this), generate the tests and save them.

template load file 'calculations.nbitt';
suite generate;
suite save 'calculations.nbits';

Once we’ve the CSV file and the genbiL script, we still need to build a template. There are two tricky parts in this template. The first one is to generate the list of measures in the MDX queries. Elements of this list must be separated by a coma. If you’ve three elements, you’ll need two comas.

select
   {
      [Measures].[Internet Gross Profit],
      [Measures].[Internet Sales Amount],
      [Measures].[Internet Gross Profit Margin]
   } on 0,

To achieve that with the StringTemplate engine, you’ll need to use the separator keyword (doc) for any variable containing an array.

select
   {
      $Measure; separator=", 
"$
   } on 0,

The second tricky part sounds similar but is really different. defining the list of aliases. In this case there is no separator, if you’ve three elements, you’ll need exactly the same xml elements three times. An additional detail is that we’ll need to add some figures starting at 1 and increasing on each new element. Hopefully StringTemplate has the concept of anonymous templates that can be really helpful for this. In this case we’ll explain that for each value contained in the array of column alias, we’ll have to create an xml element alias and add an attribute column-index with an incremental value. Note that the incremental value is automatically assigned to $i$ by StringTemplate and is starting at 1. Each element contained in the array of alias will be assigned to the variable x in the anonymous template. The anonymous template is surrounded by the pipe and the right curly brace.

$alias:{ x | <alias column-index="$i$">$x$</alias>}$

The template will also the StringTemplate functions trunc returning all the elements of an array except the last one and the function last returning the last element of an array.

The full listing for the template is available here under:

  <test name="$trunc(alias); separator=" and "$ to calculate $last(alias)$">
    <system-under-test>
      <resultSet>
        <query>
          <![CDATA[
          select
          {
         $measure; separator=", 
"$
          } on 0,
          {
          [Customer].[Country].members
          } on 1
          from
          [Adventure Works]
          ]]>
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <all-rows>
        $alias:{ x | <alias column-index="$i$">$x$</alias>}$
        <expression name="calculate">
           Round($expression$, $round$) = Round($last(alias)$, $round$)
        </expression>
        <predicate operand="calculate" type="boolean">
          <true />
        </predicate>
      </all-rows>
    </assert>
  </test>

And voilà, now you just need to run your genbiL script and you’ll get your two tests well generated. Feel free to add new cases!

Advertisements

Automating the testing of Elasticsearch queries with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. In 2018, I’m adding support for NoSQL databases. Earlier this year, I released some extensions for the Graph databases but This new extension is targetting Elasticsearch RESTful search and analytics engine. The code for this extension to the NBi framework is hosted on GitHub.

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI and data centric solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some queries on an Elasticsearch instance, you must download the version 1.18 of NBi and its extension for Elasticsearch in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.Elasticsearch (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project files as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I started an elasticsearch instance and run the sample data from the Bank example.

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.Elasticsearch.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.Elasticsearch"/>
    </extensions>
  </nbi>
</configuration>

The concept of connection-string is not familiar to Elasticsearch but to match with relational databases, we use a concatenation of the usual parameters defined to connect to a Elasticsearch instance with the low level client available in C#. You must provide an url with the hostname, the port, the username and password. The protocol should be elasticsearch at the moment to make a distinction with other databases relying on the http protocol.

elasticsearch://user:password@localhost:9200

In your test-suite, querying a relational database with SQL or an Elasticsearch cluster with a query written in the Query DSL is not different. You’re still using the result-set and query elements. The query is specified within the query.

Elasticsearch queries can return complex results such as hits, aggregations or counts. Currently, NBi support for Elasticsearch is limited to queries returning a hits or aggregations.

This really first test is asserting that the query dedicated to search the 5 persons with the higher balance will effectively returns 5 rows.

<test name="5 highest balances returns 5 rows" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
        GET bank/_search
        {
           "query": {"match_all": { }}
           , "size": 5
           , "_source": ["gender", "age", "balance"]
           , "sort": [ { "balance" : {"order" : "desc"}}]
        }
        ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <row-count>
      <equal>5</equal>
    </row-count>
  </assert>
</test>

In the second test, the same query than in previous example should return rows with an age between 20 and 40 and a balance greater than 48.000$

<test name="5 highest balance" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
        GET bank/_search
        {
           "query": {"match_all": { }}
           , "size": 5
           , "_source": ["gender", "age", "balance"]
           , "sort": [ { "balance" : {"order" : "desc"}}]
        }
        ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <all-rows>
      <combination operator="and">
        <predicate operand="age">
          <within-range>[20;40]</within-range>
        </predicate>
        <predicate operand="balance">
          <more-than>48000</more-than>
        </predicate>
      </combination>
    </all-rows>
  </assert>
</test>

The next test is checking the result of an aggregation. More specifically than the the three states with the higher average for the balance are Washington, Alabama and Rhode Island but also that the count of documents and the average balance are more or less in the expected range.

<test name="Top 3 of average balance by state" uid="0002">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
        GET /bank/_search
        {
          "size": 0,
          "aggs": {
            "group_by_state": {
              "terms": {
                "field": "state.keyword",
                "size" : 3,
                "order": {
                  "average_balance": "desc"
                }
              },
              "aggs": {
                "average_balance": {
                  "avg": {
                    "field": "balance"
                  }
                }
              }
            }
          }
        }
        ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo tolerance="1">
      <resultSet>
        <row>
          <cell column-name="key">WA</cell>
          <cell column-name="doc_count">(+)</cell>
          <cell column-name="average_balance">43265</cell>
        </row>
        <row>
          <cell>AL</cell>
          <cell>6</cell>
          <cell>41418</cell>
        </row>
        <row>
          <cell>RI</cell>
          <cell>[5;10]</cell>
          <cell>40040</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>
FoF-running-test-suite

Our test-suite is green … time to write some additional tests! Feel free to explore the other kind of tests that NBi is supporting and report your ideas for improvement at the <a href="http://The ” target=”_blank”>GitHub issues for NBi.Elasticsearch

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter or on the Github repository (section “issues”). The code of this extension is also available on Github.

Automating the testing of Neo4j cypher queries with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. In 2018, I’m adding support for graph databases. Earlier this year, I released an extension for the Azure Cosmos DB API and notably the Graph API and another for Tinkerpop-enabled databases supporting Gremlin queries. This new extension is supporting the Neo4j databases and the Cypher query language. The code for this extension is hosted on GitHub.

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some cypher queries, you must download the version 1.18 of NBi and its extension for Gremlin in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.Neo4j (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project file as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I started a Neo4j instance and created some vertexes and edges based on the classical movies database shipped with Neo4j

:play movies graph

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.Neo4j.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.Neo4j"/>
    </extensions>
  </nbi>
</configuration>

The concept of connection-string is not familiar to Neo4j but to match with relational databases, we use a concatenation of the usual parameters defined to connect to a Neo4j instance with the bolt driver. You must provide an url with the hostname, the port, the username and password.

bolt://user:password@localhost:7474

In your test-suite, querying a relational database with SQL or a Neo4j instance with a cypher query is not different. You’re still using the result-set and query elements. The cypher query is specified within the query. It supports parameters and template-tokens.

Cypher queries can return complex results such as paths that are not easily mapped to a table (result-set). Currently, NBi support for Cypher is limited to queries returning a list of values.

In this test, NBi is asserting that the query to return Tom Hanks’ movies, released after 2000, is effectively what we’re expecting.

<test name="Tom Hanks'movies after 2000" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies)
        WHERE tomHanksMovies.released>2000
        RETURN tomHanksMovies.title, tomHanksMovies.released
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo>
      <resultSet>
        <row>
          <cell>Charlie Wilson's War</cell>
          <cell>2007</cell>
        </row>
        <row>
          <cell>The Polar Express</cell>
          <cell>2004</cell>
        </row>
        <row>
          <cell>The Da Vinci Code</cell>
          <cell>2006</cell>
        </row>
        <row>
          <cell>Cloud Atlas</cell>
          <cell>2012</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>
FoF-running-test-suite

Our test-suite is green … time to write some additional tests!

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter or on the Github repository (section “issues”). The code of this extension is also available on Github.

Automating the testing of gremlin queries with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. In 2018, I’m adding support for graph databases. Earlier this year, I released an extension for the Neo4j bolt driver supporting Cypher queries and another compatible with some of the Azure Cosmos DB API and notably the Graph API. This new extension is supporting all the Apache Tinkerpop-enabled databases and is tested on Azure Cosmos Db and Tinkerpop Server but should also work other graph databases supporting gremlin such as JanusGraph, Titan and OrientDB. The code for this extension is hosted on GitHub.Note that Azure Cosmos DB is now recommending to use the Gremlin driver in place of the Graph API dlls, so this extension should also be your first target if you want to connect to Azure Cosmos DB database and you want to run gremlin queries.

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some gremlin queries, you must download the version 1.18 of NBi and its extension for Gremlin in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.Gremlin (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project file as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I started a Tinkerpop Server and created some vertexes and edges with:

g.V().drop()
g.addV('person').property('id', 'thomas')
  .property('firstName', 'Thomas').property('age', 44)
g.addV('person').property('id', 'mary')
  .property('firstName', 'Mary')
  .property('lastName', 'Andersen')
  .property('age', 39)
g.addV('person').property('id', 'ben')
  .property('firstName', 'Ben')
  .property('lastName', 'Miller')
g.addV('person').property('id', 'robin')
.property('firstName', 'Robin')
  .property('lastName', 'Wakefield')
g.V().has('firstName','Thomas').addE('knows')
  .to(g.V().has('firstName','Mary'))
g.V().has('firstName','Thomas').addE('knows')
  .to(g.V().has('firstName','Ben'))
g.V().has('firstName','Ben').addE('knows')
  .to(g.V().has('firstName','Robin'))

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.Gremlin.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.Gremlin"/>
    </extensions>
  </nbi>
</configuration>

You also need to edit your config file. Add all the binding redirections that the Azure Cosmos DB is coming with. To achieve this, copy/paste the content of the runtime element of NBi.Core.CosmosDb.dll.config into a new runtime element in the config file of your test-suite FoF.config.

<runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="System.Diagnostics.DiagnosticSource" publicKeyToken="cc7b13ffcd2ddd51" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.0.1.0" newVersion="4.0.1.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Net.Http" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.2.0.0" newVersion="4.2.0.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Net.WebSockets" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.0.2.0" newVersion="4.0.1.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Net.WebSockets.Client" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.0.2.0" newVersion="4.0.1.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-11.0.0.0" newVersion="11.0.0.0" />
      </dependentAssembly>
    </assemblyBinding>
</runtime>

The concept of connection-string is not familiar to Tinkerpop or Gremlin but to match with relational databases, we use a concatenation of the usual parameters defined to connect to an instance supporting gremlin queries. You must provide the hostname, the port, the enableSsl, the username and password. You must also add a tip for NBi to identify the correct type of connection-string by adding an api tag with the value gremlin.If your database is a Azure Cosmos DB instance, then your username will be /dbs/{db}/colls/{coll} where {db} stands for the name of the database and {coll} for the name of the collection. Also, note that the hostname is a url part of the subdomain .gremlin.cosmosdb.azure.com.

hostname=https://gremlin-demo.gremlin.azure.com;
port=443;
enableSsl=true;
username=database=/dbs/Friends/colls/FoF;
password=F...==;
api=gremlin;

The same connection-string for a Tinkerpop Server, running on your computer, will be

hostname=localhost;
port=8182;
enableSsl=false;
username=database=Anything;
password=Anything;
api=gremlin;

Note that username and password are not expected by a Tinerpop Server. Anyway, they can’t be omited but any value will make it.

In your test-suite, querying a relational database with SQL or a Tinkerpop-enabled database with a gremlin query is not different. You’re still using the result-set and query elements. The gremlin query is specified within the query. It doesn’t support parameters at the moment but you’re still free to use template-tokens.

Gremlin queries can return complex results such as paths that are not easily mapped to a table (result-set). Currently, NBi support for Gremlin is limited to queries returning a list of Vertexes or a list of Edges or a list of values.

In this first test, NBi is asserting that the query to return Thomas’ friends of friends is effectively returning the correct result. In this case a unique vertex representing a person named Robin.

FoF-Thomas-Robin
<test name="Thomas' friends of friends" uid="0001">
    <system-under-test>
      <resultSet>
        <query>
          g.V().has('firstName', 'Thomas').outE('knows').inV().hasLabel('person').outE('knows').inV().hasLabel('person')
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <equalTo values-default-type="text">
        <resultSet>
          <row>
            <cell>robin</cell>
            <cell>person</cell>
            <cell>vertex</cell>
            <cell>Robin</cell>
            <cell>Wakefield</cell>
          </row>
        </resultSet>
      </equalTo>
    </assert>
  </test>

In this second test, we’re asserting that the all the vertexes contain a property age greater than 20.

<test name="All of them are older than 20 (or unknown)" uid="0002">
    <system-under-test>
      <resultSet>
        <query>
          g.V()
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <all-rows>
        <combination operator="or">
          <predicate operand="age">
            <null/>
          </predicate>
          <predicate operand="age">
            <more-than>20</more-than>
          </predicate>
        </combination>
      </all-rows>
    </assert>
  </test>

And finally, in this last test, we’re checking that the result-set returned by a projection& matches the expected result.

  <test name="Count of person's relationships" uid="0003">
    <system-under-test>
      <resultSet>
        <query>
          g.V().project('FirstName','KnowsCount').by('firstName').by(out().count())
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <equalTo>
        <resultSet>
          <row>
            <cell>Mary</cell>
            <cell>0</cell>
          </row>
          <row>
            <cell>Robin</cell>
            <cell>0</cell>
          </row>
          <row>
            <cell>Ben</cell>
            <cell>1</cell>
          </row>
          <row>
            <cell>Thomas</cell>
            <cell>2</cell>
          </row>
        </resultSet>
      </equalTo>
    </assert>
  </test>
FoF-running-test-suite

Our test-suite is green … time to write some additional tests!

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter or on the Github repository (section “issues”). The code of this extension is also available on Github.

Automating the testing of an Azure Cosmos DB instance with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. To give a really good start to the year 2018, NBi is now supporting Azure Cosmos DB databases using the Graph API (gremlin queries)!

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some gremlin queries through the Graph API of Azure Cosmos DB, you must download the version 1.18 of NBi and its extension for Gremlin in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.CosmosDb (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project file as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I created a database on Azure Cosmos DB (named Friends) containing a graph (named FoF). I created some vertexes and edges with:

g.V().Drop()
g.addV('person').property('id', 'thomas')
  .property('firstName', 'Thomas')
  .property('age', 44)
g.addV('person').property('id', 'mary')
  .property('firstName', 'Mary')
  .property('lastName', 'Andersen')
  .property('age', 39)
g.addV('person').property('id', 'ben')
  .property('firstName', 'Ben')
  .property('lastName', 'Miller')
g.addV('person').property('id', 'robin')
  .property('firstName', 'Robin')
  .property('lastName', 'Wakefield')
g.V('thomas').addE('knows').to(g.V('mary'))
g.V('thomas').addE('knows').to(g.V('ben'))
g.V('ben').addE('knows').to(g.V('robin'))

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.CosmosDB.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.CosmosDb"/>
    </extensions>
  </nbi>
</configuration>

You also need to edit your config file. Add all the binding redirections that the Azure Cosmos DB is coming with. To achieve this, copy/paste the content of the runtime element of NBi.Core.CosmosDb.dll.config into a new runtime element in the file FoF.config.

The concept of connection-string is not familiar to Azure Cosmos DB but to match with relational databases, we use a concatenation of the usual parameters defined to connect to an Azure Cosmos DB instance. You must provide the endpoint, the authkey, the database Id and the collection Id, concatenated into named-tokens separated by semi-colons. For the collection Id, you must use the name graph and not collection, in prevision of NBi’s compatibility with other APIs supported by Cosmos DB.

Endpoint=https://gremlin-demo.documents.azure.com:443;AuthKey=F...==;database=Friends;graph=FoF

In your test-suite, querying a relational database or your Azure Cosmos DB database with the Graph API is not different. You’re still using the result-set and query elements. The gremlin query is specified within the query. It doesn’t support parameters at the moment but you’re still free to use template-tokens.

Graph queries can return complex results such as paths that are not easily mapped to a table (result-set). Currently, NBi supports for Cosmos DB is limited to queries returning a list of Vertexes or a list of Edges or a list of values.

In this first test, NBi is asserting that the query to return Thomas’ friends of friends is effectively returning the correct result. In this case a unique vertex representing a person named Robin.

FoF-Thomas-Robin
<test name="Thomas' friends of friends" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        g.V('thomas').outE('knows').inV().hasLabel('person').outE('knows').inV().hasLabel('person')
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo values-default-type="text">
      <resultSet>
        <row>
          <cell>robin</cell>
          <cell>person</cell>
          <cell>vertex</cell>
          <cell>Robin</cell>
          <cell>Wakefield</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>

In this second test, we’re asserting that the all the vertexes contain a family name correctly capitalized:

<test name="All of them are older than 20" uid="0002">
  <system-under-test>
    <resultSet>
      <query>
        g.V()
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <all-rows>
      <predicate operand="firstName" type="text">
        <matches-regex>^([A-Z][a-z]+)$</matches-regex>
      </predicate>
    </all-rows>
  </assert>
</test>

And finally, in this last test, we’re checking that the result-set returned by a projection matches the expected result.

<test name="Count of person's relationships" uid="0003">
  <system-under-test>
    <resultSet>
      <query>
        g.V().project('FirstName','KnowsCount').by('firstName').by(out().Count())
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo>
      <resultSet>
        <row>
          <cell>Mary</cell>
          <cell>0</cell>
        </row>
        <row>
          <cell>Robin</cell>
          <cell>0</cell>
        </row>
        <row>
          <cell>Ben</cell>
          <cell>1</cell>
        </row>
        <row>
          <cell>Thomas</cell>
          <cell>2</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>
FoF-running-test-suite

Our test-suite is green … time to write new tests!

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter
or on Github. The code of this extension is also available on Github.

Presenting before the summer ’17

I’ll be really busy presenting before this summer. With four sessions (including three completely new) to deliver in less than 30 days, I’ll have less times for my open-source projects.

busy.jpg

The first session will be at Charleroi (Hainaut, Belgium) on the 2nd of June for the Journée Agile 2017. This is a bit special for me, I never talked so close to my hometown and it will be my first time in Wallonia since 10 years. No one is a prophet in his own country, even if it is a small one! It will also be different because the main topic of the conference is not SQL Server or databases but agility. A concept that I use since 10 years, I’m really not a newcomer. Agility and BI are two concepts with little traction, let’s break the walls! I’ll speak (in French) about testing automation for BI solutions, this is the session that I’m presenting across Europe but I’ll adapt it a bit to the attendance and stress the agility topic (and a bit less time on the demos). If you’re interested by the agile topic (and if you’re not, you should really care about this … the world has changed, hurry up!), a few seats are still available.

The next stop will be at SQL Saturday Rheinland (Sankt Augustin, North Rhine-Westphalia, Germany – University between Bonn and Köln). The session will be about the M Language (the language used for Power Query/Power BI to get and transform data) and how much it’s fun to code with a functional language. It’ll be the first time that I’ll present this session as-is but I already included large parts of this session in some blog posts or show-cases. Not a lot of stress to be ready on time. About M, the main topic of this session, I’m really impressed by the power of this language and I’ll try to share a bit of my knowledge and help you to make the first steps in the right direction. Don’t be confused, it’s not a session about Power Query/Get & Transform, I won’t use the UI … I’ll show you the code and how to resolve more advanced cases that you won’t be able to implement with just the user interface.

SQLSaturday #605 - Rheinland 2017

Then, at the end of the month, I’ll be presenting at the 24 Hours of Pass (French edition – 2017). This year, I accepted to present two sessions (Webcasts in French). Can you really say “no” twice to Isabelle and Jean-Pierre? The first session is about going high in normal forms for data warehouses. Don’t panic, I won’t be boring explaining in theory what’s a normal form. I just want to explain why Data Vault, Anchor Modeling (and Tibre Modeling, my own “methodology” to model time-based relations in a data warehouse) are really powerful approaches for some specific cases. I’ll also show you why SQL Server is suited to host and serve this kind of model. I’m really excited about this session, it’ll be a worldwide première and I hope to have the opportunity to present it again later, let’s cross-fingers.

The second session at the 24 HOP French Edition is about Graph processing with SQL Server 2017 (aka SQL Graph). That’s now several years that I’m following Neo4j and I appreciate how simple it’s to resolve some use-cases with Cypher and this engine. I still need to develop large parts of my session but the plan is surely not to make an in-depth comparison of the two engines. I’m expecting to explain why a Graph processing is better suited for some specific cases and show the difference in terms of expressiveness and performance with a standard SQL approach.

How to connect to PostgreSQL with NBi?

I often receive questions such as “Does NBi work with PostgreSQL?” or “Is it possible to use another database than SQL Server?”. The answer is definitively: YES! NBi is working with all databases supporting OleDB or ODBC connections.

Yesterday, I edited a blog post for the connection to MySQL, today we’ll see how to connect to PostgreSQL from NBi by using both the ODBC and the OleDB connectors. At the opposite of MySQL, PostgreSQL has at least one ODBC driver and at least one OleDB provider still maintained (and even actively developed).

I’ll start by the ODBC driver for PostgreSQL. You can find the binaries on the official PostgreSQL website. Download them and install this driver. You can check that the ODBC driver has been correctly installed in the “Program and Features” panel:

postgresql-odbc-connector

 

Another great way to check that the ODBC drivers are installed is to user PowerShell and the cmdlet

Get-OdbcDriver

This method returns the following list where you should find something related to PostgreSQL.

postgresql-odbc-driver

This method will also give you the exact name of the driver, something where my google-fu usually miserably fails. In this case I’ve two drivers named PostgreSQL ANSI(x64) and PostgreSQL Unicode(x64). Be sure to use these exact names in your connection strings!

When these connectors are successfully installed, edit your test-suite and define the connection string to your PostgreSQL server as:

<query
   connectionString=
      "Driver={PostgreSQL ANSI(x64)};
       Server=127.17.0.2;
       Database=postgres;
       UId=CI-Build;
       Pwd=Xyz;" 
>
  <![CDATA[
     select 'Hellow world'
  ]]>
</query>

That’s it!

For the OleDB provider, I’ll use the commercial version provided by Intellisoft and available on this page. This product comes with a free trial. Once downloaded and installed, you can check that it has correctly been registered by running the following PowerShell cmdlet:

(New-Object system.data.oledb.oledbenumerator).GetElements()

and it should contain the following result:

postgresql-oledb-provider

Unfortunately, this method doesn’t return the nickname of the provider that we must use in our connection string. In this specific case, this name is PNGP.1.

Because this OleDB provider is not pre-registered by NBi, we need to map it to the OleDb namespace as explained in the documentation of NBi. The config file must be updated with:

<nbi testSuite="PostgreSQL-TestSuite.nbits">
  <providers>
    <add id="PGNP.1" invariant-name="System.Data.OleDb"/>
  </providers>
</nbi>

Once it’s done, you can safely edit your connection string into your test-suite.

<query
  connectionString=
    "Provider=PGNP.1;
     Data Source=127.17.0.2;
     Initial Catalog=postgres;
     User ID=CI-Build;
     Password=Xyz;"
>
  <![CDATA[
     select 'Hello World'
  ]]>
</query>

We could move this connection string to the settings (references or defaults) or to the config file.

You can now configure a connection to a PostgreSQL instance without any problem!

How to connect to MySQL with NBi?

I often receive questions such as “Does NBi work with MySQL/PostgreSQL?” or “Is it possible to use another database than SQL Server?”. The answer is definitively: YES! NBi is working with all databases supporting OleDB or ODBC connections.

Today, I’ll specifically answer the question about MySQL, tomorrow I’ll edit another blog post about PostgreSQL. For MySQL, I’ll connect from NBi by using an ODBC connector. Indeed, MySQL has no support for an OleDB connector. To be 100% transparent, some old versions of OleDB connectors exist but it doesn’t sound to have any active development on these projects.

Start by downloading the latest version of the ODBC driver for MySQL from the MySQL website. Then install it. You can check that the ODBC driver has been correctly installed in the “Program and Features” panel:

mysql-odbc-connector

Another great way to check that the ODBC drivers are installed is to user PowerShell and the cmdlet

Get-OdbcDriver

This method returns the following list where you should find something related to MySQL.

mysql-odbc-driver

Do you need the 32-bits(x86) or 64-bits(x64) drivers? It depends of the application using the ODBC driver … both need to match. When using the nunit-console or the GUI of NUnit, you’ll need the 64-bits version. But if you’re running your tests from Visual Studio then you’ll need the 32-bits version. You can install both on the same computer but it’s apparently recommended to install first the 64-bits version and then the 32-bits. Haven’t checked this personally.

When the connector is successfully installed, edit your test-suite and define the connection string to your MySQL server as:

<query
   connectionString=
      "Driver={MySQL ODBC 5.3 UNICODE Driver};
       Server=127.17.0.3;
       Database=adventureworks;
       User=CI-Build;
       Password=Xyz;" 
>
  <![CDATA[
     select count(*) from adventureworks.address
   ]]>
</query>

We could also use this connection string in our settings (defaults or references) or in our config file.

Now, you can use your MySQL instance exactly the same way you’d use a SQL Server instance.

Regression test-suite for BI solution – Part 2

In the previous blog post of this series, I’ve explained how to query a DMV and how to modify a little bit the set of test-cases generated by the query. In this post, we’ll see how to join two sets of test-cases to obtain a unique set.

Now that you’ve loaded the content of the a first DMV, we can apply the same recipe for the second DMV. Well, It’s not so easy, if we just do this, the second call to case load will override the result of the first query. To keep in memory two sets of test-cases, in genbiL, you’ll have to give them a name. This action is done through the keyword scope (more info). This action must be executed when you want to work on a different set of test-cases.

case scope 'dimensions';
case load query
{
 ...
}
on '...';

case scope 'hierarchies';
case load query
{
 ...
}
on '...';

To specify on which scope you want to apply actions (filter, load, hold, …), you first need to specify the action scope with the name of the scope on which you want to apply actions. It’s possible to switch the scope at any moment and you can o back to a scope previously loaded.

To join the two sets, you’ll need to use the command cross (more info) with a jointure. But, before crossing the two existing datasets, you need to scope on a third empty scope that will contain the result of the cross.

case scope 'dimensions and hierarchies';
case cross 'dimensions' with 'hierarchies' on 'dimension_unique_name';

Now that you’ve understood how to cross (join) two datasets, you can safely load some other DMVs and combine them to build a global picture of your cube. First step, load measures and measuregroups then combine them. Second step, use the DMV returning the intersections between dimensions and measuregroups. If you combine this to the two already combined datasets, you’ll have the interesting interactions between hierarchies and measures. Probably the best starting point to automatically generate queries to apply on your new cube and old cube and compare result-sets. Let’s do this:

Create a new a new template to support comparison query-to-query (save it as regression\my-template.nbitt):

<test name="Compare '$Measure_Caption$' by '$hierarchy_caption$'">
  <description>Assert that the same query, 
  executed on two instances of the cube, 
  returns the same result-set. Specifically, 
  for measure '$measure_caption$' on hierarchy 
  '$hierarchy_caption$'</description>
  <edition author="$username$" created="$now$"/>
  <category>Measuregroup '$measuregroup_name$'</category>
  <trait name="Template">my-template.nbitt</trait>
  <system-under-test>
    <execution>
      <query>
        <![CDATA[
        select 
          $measure_unique_name$ on 0
          , non empty($hierarchy_unique_name$.children) on 1 
        from 
          [$cube_name$] 				
        ]]>
      </query>
    </execution>
  </system-under-test>
  <assert>
    <equalTo keys="all-except-last" tolerance="$tolerance$">
      <query>
        <![CDATA[
        select 
          $measure_unique_name$ on 0
          , non empty($hierarchy_unique_name$.children) on 1 
        from 
          [$cube_name$]
        ]]>
      </query>
    </equalTo>
  </assert>
</test>

Add the following genbiL code to load all the DMV

case scope 'dimensions';
case load query
{
	select
		[Dimension_Unique_Name]
		, [Dimension_Caption]
	from
		[$system].[MDSchema_dimensions]
	where
		[Cube_Name]='your-cube-name'
		and [Catalog_Name]='your-initial-catalog'
		and Dimension_Is_Visible
}
on 'Provider=MSOLAP.4;Data Source=your-data-source;Initial Catalog=your-initial-catalog';

case scope 'hierarchies';
case load query
{
	select
		[Dimension_Unique_Name]
		, [Hierarchy_Unique_Name]
		, [Hierarchy_Caption]
	from
		[$system].[MDSchema_hierarchies]
	where
		[Cube_Name]='your-cube-name'
		and [Catalog_Name]='your-initial-catalog'
		and Hierarchy_Is_Visible
}
on 'Provider=MSOLAP.4;Data Source=your-data-source;Initial Catalog=your-initial-catalog';

case scope 'levels';
case load query
{
	select
		[Hierarchy_Unique_Name]
		, [Level_Unique_Name]
		, [Level_Caption]
	from
		[$system].[MDSchema_levels]
	where
		[Cube_Name]='your-cube-name'
		and [Catalog_Name]='your-initial-catalog'
		and Level_Is_Visible
		and [Level_Number]<>'0'
}
on 'Provider=MSOLAP.4;Data Source=your-data-source;Initial Catalog=your-initial-catalog';

case scope 'measureGroups';
case load query
{
	select
		[MeasureGroup_Name]
	from
		[$system].[MDSchema_measureGroups]
	where
		[Cube_Name]='your-cube-name'
		and [Catalog_Name]='your-initial-catalog'
}
on 'Provider=MSOLAP.4;Data Source=your-data-source;Initial Catalog=your-initial-catalog';
case filter distinct;

case scope 'measures';
case load query
{
	select
		[Cube_Name]
		, [Measure_Unique_Name]
		, [Measure_Caption]
		, [MeasureGroup_Name]
		, [Measure_Display_Folder]
	from
		[$system].[MDSchema_measures]
	where
		[Cube_Name]='your-cube-name'
		and [Catalog_Name]='your-initial-catalog'
		and Measure_Is_Visible
}
on 'Provider=MSOLAP.4;Data Source=your-data-source;Initial Catalog=your-initial-catalog';

case scope 'links';
case load query
{
	select
		[MeasureGroup_Name]
		, [Dimension_Unique_Name]
	from
		[$system].[MDSCHEMA_MeasureGroup_Dimensions]
	where
		[Cube_Name]='your-cube-name'
		and [Catalog_Name]='your-initial-catalog'
		and Dimension_Is_Visible
}
on 'Provider=MSOLAP.4;Data Source=your-data-source;Initial Catalog=your-initial-catalog';

Once we’ve the separated DMV, we can cross them together to obtain our model

case scope 'dimensions-measures';
case cross 'measures' with 'links' on 'MeasureGroup_Name';
case cross 'dimensions-measures' with 'dimensions' on 'Dimension_Unique_Name';
case cross 'dimensions-measures' with 'hierarchies' on 'Dimension_Unique_Name';

Add a field tolerance of 0.001

case add column 'tolerance' values '0.001';

Then, generate the tests based on the test-cases loaded in the newly created scope and the template build above.

template load file 'Regression\my-template.nbitt';
suite generate;

And that’s it.

Now up-to-you to build other queries and filter out the not pertinent test-cases.

SQL Saturday in Vienna and MS Cloud Summit in Paris to start in 2017

I’m sitting in my couch, watching a movie with my daughters, reminiscing about holidays, and getting ready for the first speaking engagements of 2017.

sql-saturday-579-viennaI’m completely thrilled to be invited to Vienna on January 20th to speak at SQL Saturday #579. I’ll be tackling a rather difficult subject … testing BI solutions. During this session, we’ll look to the features of the open-source framework named nbi. This framework is providing support for automated tests on the fields of databases, cubes, reports and ETLs, without the need of .Net skills. The demos will show us the best approaches to quickly and effectively assert the quality of BI developments. We’ll go a step further, generating the tests by an interesting system of templates and test-cases sources.

mscloudsummit-paris-2017I’ll also be speaking (in French) in Paris at the MS Cloud Summit Paris 2017. The topic will be Azure Stream Analytics. This is a completely new session and it’s taking a lot of patience, perseverance, and hard work to get ready (and to be honest it’s not 100% finished). Building the slide deck is the quick and easy part. But designing the demo scenarios, scripting and testing the demos, is the time-consuming part (Then will come the practicing part).

For both conferences, if Business Intelligence is not your cup of tea, there are plenty of other topics to choose from. With sessions geared toward Database Development, Database Administration/Security and even, in Paris, Office 365, Power Flow, Power Apps, Azure functions, … Both conferences offer a wide variety of content for everyone.

See you there?