Author: seddryck

Using NBi with hierarchy’s level in MDX queries

NBi is a toolset dedicated to test BI and data-centric applications. One of the key feature is to compare the results returned by two queries. This can be used to compare the result of a query on the relational datawarehouse and on the OLAP cube. A pitfall that you could face when using NBi with MDX queries is the count of columns returned by this kind of query when a hierarchy’s level is defined on an axis.

When executing the following query (on the Adventure Works 2012 sample database/cube), you’ll see two columns in the result displayed by SSMS. It’s probably what you’re expecting, you’re only selecting one specific level of the hierarchy [Date].[Calendar Date] and one measure.

hierarchy_ssms

You’re probably expecting that NBi will also consider two columns. Unfortunately, it’s not the case: NBi will consider 4 columns! What are the additional and unexpected columns? The [Date].[Calendar].[Calendar Year] and [Date].[Calendar].[Calendar Semester] are also returned. In reality, this is not something specific to NBi, it’s just the standard behaviour of the ADOMD library and SSMS is cheating when only displaying one column for the level!

If we don’t care about this specificity of NBi and setup a test where the MDX query is the system-under-test, then we’ll have to face the following exception

NBi.NUnit.Runtime.CustomStackTraceErrorException :
The column with index '[Date].[Calendar].[Calendar Semester].[MEMBER_CAPTION]'
 is expecting a numeric value but the first row of your result set contains a value 'H1 CY 2008'
 not recognized as a valid numeric value or a valid interval.
Hierarchy_UnexpectedColumns

When comparing the second column of your assertion (the orders’ count so the value, so by default a numeric value) to the second column of your system-under-test (what you’re expecting to be the orders’ count but is in reality the [Date].[Calendar].[Calendar Semester]) … NBi is not able to convert “H1 CY 2008” to a numeric value.

To overcome this issue, the easiest is to fake in the SQL query the additional columns returned by the MDX query. In place of having the following SQL query on your assertion:

select
	, 'Q' + cast([CalendarQuarter] as char(1)) + ' CY 2008' as QY
	, count(distinct SalesOrderNumber)
from
	dbo.factInternetSales f
inner join
	Dimdate d
	on d.DateKey = f.OrderDateKey
where
	f.OrderDateKey between 20080000 and 20089999
group by
	[CalendarQuarter]

Just fake the two additional columns filling them by NULL.

select
	null as [Year]
	, null as [HY]
	, 'Q' + cast([CalendarQuarter] as char(1)) + ' CY 2008' as QY
	, count(distinct SalesOrderNumber)
from
	...

But it’s not the end! You’ll have to inform NBi that it shouldn’t care about these two additional columns. It can be done in the equalTo element where you can specify to ignore some columns. Ignoring means that these columns are not keys and neither values … they don’t interfere in the comparison process. To successfully implement that you’ll define the role as ignore for these columns.

<equalTo keys="all-except-last">
   <column index="0" role="ignore"/>
   <column index="1" role="ignore"/>
   <query>
      select
        null as [Year]
        , null as [HY]
        , 'Q' + cast([CalendarQuarter] as char(1)) + ' CY 2008' as QY
        , count(distinct SalesOrderNumber)
      from
        ...

Now that you expectly said to NBi that he should expect four columns and don’t care about the two firsts, you’ll be able to run your test and find real issues or success!

NBi.NUnit.Runtime.TestSuite.Order's count by Quarter Year for CY 2008:
NBi.NUnit.Runtime.CustomStackTraceAssertionException : Execution of the query doesn't match the expected result 

  Expected: 
Result-set with 3 rows

     #0 (Column1) | #1 (Column2) | #2 (Column3) | #3 (Column4)   
     (Text)       | (Text)       | KEY (Text)   | VALUE (Numeric)
     ------------ | ------------ | ------------ | --------------- 
     (null)       | (null)       | Q1 CY 2008   | 6080           
     (null)       | (null)       | Q2 CY 2008   | 6886           
     (null)       | (null)       | Q3 CY 2008   | 976            


  But was:  
Result-set with 3 rows

     #0 ([Date].[Calendar].[Calendar Year].[MEMBER_CAPTION]) | #1 ([Date].[Calendar].[Calendar Semester].[MEMBER_CAPTION]) | #2 ([Date].[Calendar].[Calendar Quarter].[MEMBER_CAPTION]) | #3 ([Measures].[Order Count])
     (Text)                                                  | (Text)                                                      | KEY (Text)                                                 | VALUE (Numeric)              
     ------------------------------------------------------- | ----------------------------------------------------------- | ---------------------------------------------------------- | ----------------------------- 
     CY 2008                                                 | H1 CY 2008                                                  | Q1 CY 2008                                                 | 6082                         
     CY 2008                                                 | H1 CY 2008                                                  | Q2 CY 2008                                                 | 6886                         
     CY 2008                                                 | H2 CY 2008                                                  | Q3 CY 2008                                                 | 976                          





Non matching value rows:
------------------------

Result-set with 1 row

     #0 ([Date].[Calendar].[Calendar Year].[MEMBER_CAPTION]) | #1 ([Date].[Calendar].[Calendar Semester].[MEMBER_CAPTION]) | #2 ([Date].[Calendar].[Calendar Quarter].[MEMBER_CAPTION]) | #3 ([Measures].[Order Count])
     (Text)                                                  | (Text)                                                      | KEY (Text)                                                 | VALUE (Numeric)              
     ------------------------------------------------------- | ----------------------------------------------------------- | ---------------------------------------------------------- | ----------------------------- 
     CY 2008                                                 | H1 CY 2008                                                  | Q1 CY 2008                                                 | 6082 <> 6080                 
Advertisements

Using NBi to assert calculations based on measures

A few days ago on GitHub, I received a request to explain how to assert the results of some calculations with measures and also how to automate these tests with NBi. This blog post is the complete answer to this question.

To illustrate the case, I’ll use the Adventure Works 2012 multidimensional cube, but you can do the same with any other database. In this cube, you’ve a measure-group with the measures: Internet Gross Profit, Internet Sales Amount, Internet Gross Profit Margin. As you can guess, the last measure is the result of the division of the two firsts (expressed in percentage). My test will assert that this is still the case when I’m slicing by customers’ country.

To achieve this, I’m defining a system-under-test with a result-set defined in a MDX query:

<system-under-test>
  <resultSet>
    <query>
      <![CDATA[
      select
      {
        [Measures].[Internet Gross Profit],
        [Measures].[Internet Sales Amount],
        [Measures].[Internet Gross Profit Margin]   
      } on 0,
      {
        [Customer].[Country].members
      } on 1
      from
        [Adventure Works]
      ]]>
    </query>
  </resultSet>
</system-under-test>

Asserting the calculations’ result can be executed by a all-rows and a predicate where you’ll check that the content of the last measure is equal to the quotient of the two firsts.

Due to the complex names of columns for MDX queries, it’s recommended to use aliases. With NBi, we can define an alias based on columns’ position (starting at 0). The position is 0 will be for the column [Customer].[Country].members, we can skip it. But columns 1 to 3 will receive aliases:

<alias column-index="1">Profit</alias>
<alias column-index="2">Sales</alias>
<alias column-index="3">Margin</alias>

The test will need to compute the quotient between Profit and Sales and check that this value is equal to the value returned by the cube in the column Margin. This can be expressed in the following expression:

<expression name="calculate">
    Round(Profit/Sales, 4) = Round(Margin, 4)
</expression>

The Round function is provided by the underlying library for the evaluation of these expressions: NCalc. In this case, it’s extremely useful to avoid rounding issues during computations. Due to the percentage, I didn’t round to 2 digits after the decimal separator but to 4.

Based on the expression defined above, we can use the predicate true. Each row not validating the equality defined above will return false and not validate the predicate. The test is a all-rows so if any row is not validating the predicate then the test will fail.

Some developers will perhaps try to use the predicate equal in place of true, but unfortunately, at this moment, NBi can only handle fixed value for the equal predicate and it’s not possible to specify a column This limitation is invalidating the approach with equal.

The full code for this test is

<test name="Profit, Sales and Margin">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
      select
      {
          [Measures].[Internet Gross Profit],
          [Measures].[Internet Sales Amount],
          [Measures].[Internet Gross Profit Margin] 
      } on 0,
      {
          [Customer].[Country].members
      } on 1
      from
          [Adventure Works]
      ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <all-rows>
      <alias column-index="1">Profit</alias>
      <alias column-index="2">Sales</alias>
      <alias column-index="3">Margin</alias>
      <expression name="calculate">
          Round(Profit/Sales, 4) = Round(Margin, 4)
      </expression>
      <predicate operand="calculate" type="boolean">
        <true />
      </predicate>
    </all-rows>
  </assert>
</test>

The second part of this blog post is about the automation of generation of this kind of tests. Let’s start by defining our test-cases. To define them, I’ll put the aliases that I’d like to use (column alias), the expected formulae (column expression), the exact name of the measure (column measure) and how many digits are needed for rounded comparisons (column round). In this example, I created two test-cases. The first test-case will check that Profit is equal to Sales minus Cost and the last one is asserting that Margin is equal to Profit divided by Sales.

alias;expression;measure;round
Sales,Cost,Profit;Sales-Cost;[Measures].[Internet Sales Amount],[Measures].[Internet Total Product Cost],[Measures].[Internet Gross Profit];2
Profit,Sales,Margin;Profit/Sales;[Measures].[Internet Gross Profit],[Measures].[Internet Sales Amount],[Measures].[Internet Gross Profit Margin];4

To load this CSV file in genbiL, I’ll need to use the load file function:

case load file 'calculations.csv';

Note that in this CSV file each column is separated by a semi-column (;). In the columns alias and measure, I’ve more than one value and each of these values are separated by a coma (,).

To parse this CSV file with genbiL, I’ll need to use the function split. This function is transforming a single value into an array of values using a separator.

case split column 'alias' with value ',';
case split column 'measure' with value ',';

After these two lines of code, the test-cases’ scope consists in two cases and each of them has an array of three elements in the columns alias and measure.

Next steps are straightforward: loading the template (that we’ll compose just after this), generate the tests and save them.

template load file 'calculations.nbitt';
suite generate;
suite save 'calculations.nbits';

Once we’ve the CSV file and the genbiL script, we still need to build a template. There are two tricky parts in this template. The first one is to generate the list of measures in the MDX queries. Elements of this list must be separated by a coma. If you’ve three elements, you’ll need two comas.

select
   {
      [Measures].[Internet Gross Profit],
      [Measures].[Internet Sales Amount],
      [Measures].[Internet Gross Profit Margin]
   } on 0,

To achieve that with the StringTemplate engine, you’ll need to use the separator keyword (doc) for any variable containing an array.

select
   {
      $Measure; separator=", 
"$
   } on 0,

The second tricky part sounds similar but is really different. defining the list of aliases. In this case there is no separator, if you’ve three elements, you’ll need exactly the same xml elements three times. An additional detail is that we’ll need to add some figures starting at 1 and increasing on each new element. Hopefully StringTemplate has the concept of anonymous templates that can be really helpful for this. In this case we’ll explain that for each value contained in the array of column alias, we’ll have to create an xml element alias and add an attribute column-index with an incremental value. Note that the incremental value is automatically assigned to $i$ by StringTemplate and is starting at 1. Each element contained in the array of alias will be assigned to the variable x in the anonymous template. The anonymous template is surrounded by the pipe and the right curly brace.

$alias:{ x | <alias column-index="$i$">$x$</alias>}$

The template will also the StringTemplate functions trunc returning all the elements of an array except the last one and the function last returning the last element of an array.

The full listing for the template is available here under:

  <test name="$trunc(alias); separator=" and "$ to calculate $last(alias)$">
    <system-under-test>
      <resultSet>
        <query>
          <![CDATA[
          select
          {
         $measure; separator=", 
"$
          } on 0,
          {
          [Customer].[Country].members
          } on 1
          from
          [Adventure Works]
          ]]>
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <all-rows>
        $alias:{ x | <alias column-index="$i$">$x$</alias>}$
        <expression name="calculate">
           Round($expression$, $round$) = Round($last(alias)$, $round$)
        </expression>
        <predicate operand="calculate" type="boolean">
          <true />
        </predicate>
      </all-rows>
    </assert>
  </test>

And voilà, now you just need to run your genbiL script and you’ll get your two tests well generated. Feel free to add new cases!

Automating the testing of Elasticsearch queries with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. In 2018, I’m adding support for NoSQL databases. Earlier this year, I released some extensions for the Graph databases but This new extension is targetting Elasticsearch RESTful search and analytics engine. The code for this extension to the NBi framework is hosted on GitHub.

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI and data centric solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some queries on an Elasticsearch instance, you must download the version 1.18 of NBi and its extension for Elasticsearch in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.Elasticsearch (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project files as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I started an elasticsearch instance and run the sample data from the Bank example.

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.Elasticsearch.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.Elasticsearch"/>
    </extensions>
  </nbi>
</configuration>

The concept of connection-string is not familiar to Elasticsearch but to match with relational databases, we use a concatenation of the usual parameters defined to connect to a Elasticsearch instance with the low level client available in C#. You must provide an url with the hostname, the port, the username and password. The protocol should be elasticsearch at the moment to make a distinction with other databases relying on the http protocol.

elasticsearch://user:password@localhost:9200

In your test-suite, querying a relational database with SQL or an Elasticsearch cluster with a query written in the Query DSL is not different. You’re still using the result-set and query elements. The query is specified within the query.

Elasticsearch queries can return complex results such as hits, aggregations or counts. Currently, NBi support for Elasticsearch is limited to queries returning a hits or aggregations.

This really first test is asserting that the query dedicated to search the 5 persons with the higher balance will effectively returns 5 rows.

<test name="5 highest balances returns 5 rows" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
        GET bank/_search
        {
           "query": {"match_all": { }}
           , "size": 5
           , "_source": ["gender", "age", "balance"]
           , "sort": [ { "balance" : {"order" : "desc"}}]
        }
        ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <row-count>
      <equal>5</equal>
    </row-count>
  </assert>
</test>

In the second test, the same query than in previous example should return rows with an age between 20 and 40 and a balance greater than 48.000$

<test name="5 highest balance" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
        GET bank/_search
        {
           "query": {"match_all": { }}
           , "size": 5
           , "_source": ["gender", "age", "balance"]
           , "sort": [ { "balance" : {"order" : "desc"}}]
        }
        ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <all-rows>
      <combination operator="and">
        <predicate operand="age">
          <within-range>[20;40]</within-range>
        </predicate>
        <predicate operand="balance">
          <more-than>48000</more-than>
        </predicate>
      </combination>
    </all-rows>
  </assert>
</test>

The next test is checking the result of an aggregation. More specifically than the the three states with the higher average for the balance are Washington, Alabama and Rhode Island but also that the count of documents and the average balance are more or less in the expected range.

<test name="Top 3 of average balance by state" uid="0002">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
        GET /bank/_search
        {
          "size": 0,
          "aggs": {
            "group_by_state": {
              "terms": {
                "field": "state.keyword",
                "size" : 3,
                "order": {
                  "average_balance": "desc"
                }
              },
              "aggs": {
                "average_balance": {
                  "avg": {
                    "field": "balance"
                  }
                }
              }
            }
          }
        }
        ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo tolerance="1">
      <resultSet>
        <row>
          <cell column-name="key">WA</cell>
          <cell column-name="doc_count">(+)</cell>
          <cell column-name="average_balance">43265</cell>
        </row>
        <row>
          <cell>AL</cell>
          <cell>6</cell>
          <cell>41418</cell>
        </row>
        <row>
          <cell>RI</cell>
          <cell>[5;10]</cell>
          <cell>40040</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>
FoF-running-test-suite

Our test-suite is green … time to write some additional tests! Feel free to explore the other kind of tests that NBi is supporting and report your ideas for improvement at the <a href="http://The ” target=”_blank”>GitHub issues for NBi.Elasticsearch

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter or on the Github repository (section “issues”). The code of this extension is also available on Github.

Automating the testing of Neo4j cypher queries with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. In 2018, I’m adding support for graph databases. Earlier this year, I released an extension for the Azure Cosmos DB API and notably the Graph API and another for Tinkerpop-enabled databases supporting Gremlin queries. This new extension is supporting the Neo4j databases and the Cypher query language. The code for this extension is hosted on GitHub.

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some cypher queries, you must download the version 1.18 of NBi and its extension for Gremlin in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.Neo4j (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project file as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I started a Neo4j instance and created some vertexes and edges based on the classical movies database shipped with Neo4j

:play movies graph

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.Neo4j.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.Neo4j"/>
    </extensions>
  </nbi>
</configuration>

The concept of connection-string is not familiar to Neo4j but to match with relational databases, we use a concatenation of the usual parameters defined to connect to a Neo4j instance with the bolt driver. You must provide an url with the hostname, the port, the username and password.

bolt://user:password@localhost:7474

In your test-suite, querying a relational database with SQL or a Neo4j instance with a cypher query is not different. You’re still using the result-set and query elements. The cypher query is specified within the query. It supports parameters and template-tokens.

Cypher queries can return complex results such as paths that are not easily mapped to a table (result-set). Currently, NBi support for Cypher is limited to queries returning a list of values.

In this test, NBi is asserting that the query to return Tom Hanks’ movies, released after 2000, is effectively what we’re expecting.

<test name="Tom Hanks'movies after 2000" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies)
        WHERE tomHanksMovies.released>2000
        RETURN tomHanksMovies.title, tomHanksMovies.released
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo>
      <resultSet>
        <row>
          <cell>Charlie Wilson's War</cell>
          <cell>2007</cell>
        </row>
        <row>
          <cell>The Polar Express</cell>
          <cell>2004</cell>
        </row>
        <row>
          <cell>The Da Vinci Code</cell>
          <cell>2006</cell>
        </row>
        <row>
          <cell>Cloud Atlas</cell>
          <cell>2012</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>
FoF-running-test-suite

Our test-suite is green … time to write some additional tests!

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter or on the Github repository (section “issues”). The code of this extension is also available on Github.

Automating the testing of gremlin queries with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. In 2018, I’m adding support for graph databases. Earlier this year, I released an extension for the Neo4j bolt driver supporting Cypher queries and another compatible with some of the Azure Cosmos DB API and notably the Graph API. This new extension is supporting all the Apache Tinkerpop-enabled databases and is tested on Azure Cosmos Db and Tinkerpop Server but should also work other graph databases supporting gremlin such as JanusGraph, Titan and OrientDB. The code for this extension is hosted on GitHub.Note that Azure Cosmos DB is now recommending to use the Gremlin driver in place of the Graph API dlls, so this extension should also be your first target if you want to connect to Azure Cosmos DB database and you want to run gremlin queries.

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some gremlin queries, you must download the version 1.18 of NBi and its extension for Gremlin in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.Gremlin (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project file as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I started a Tinkerpop Server and created some vertexes and edges with:

g.V().drop()
g.addV('person').property('id', 'thomas')
  .property('firstName', 'Thomas').property('age', 44)
g.addV('person').property('id', 'mary')
  .property('firstName', 'Mary')
  .property('lastName', 'Andersen')
  .property('age', 39)
g.addV('person').property('id', 'ben')
  .property('firstName', 'Ben')
  .property('lastName', 'Miller')
g.addV('person').property('id', 'robin')
.property('firstName', 'Robin')
  .property('lastName', 'Wakefield')
g.V().has('firstName','Thomas').addE('knows')
  .to(g.V().has('firstName','Mary'))
g.V().has('firstName','Thomas').addE('knows')
  .to(g.V().has('firstName','Ben'))
g.V().has('firstName','Ben').addE('knows')
  .to(g.V().has('firstName','Robin'))

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.Gremlin.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.Gremlin"/>
    </extensions>
  </nbi>
</configuration>

You also need to edit your config file. Add all the binding redirections that the Azure Cosmos DB is coming with. To achieve this, copy/paste the content of the runtime element of NBi.Core.CosmosDb.dll.config into a new runtime element in the config file of your test-suite FoF.config.

<runtime>
    <assemblyBinding xmlns="urn:schemas-microsoft-com:asm.v1">
      <dependentAssembly>
        <assemblyIdentity name="System.Diagnostics.DiagnosticSource" publicKeyToken="cc7b13ffcd2ddd51" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.0.1.0" newVersion="4.0.1.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Net.Http" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.2.0.0" newVersion="4.2.0.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Net.WebSockets" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.0.2.0" newVersion="4.0.1.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="System.Net.WebSockets.Client" publicKeyToken="b03f5f7f11d50a3a" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-4.0.2.0" newVersion="4.0.1.0" />
      </dependentAssembly>
      <dependentAssembly>
        <assemblyIdentity name="Newtonsoft.Json" publicKeyToken="30ad4fe6b2a6aeed" culture="neutral" />
        <bindingRedirect oldVersion="0.0.0.0-11.0.0.0" newVersion="11.0.0.0" />
      </dependentAssembly>
    </assemblyBinding>
</runtime>

The concept of connection-string is not familiar to Tinkerpop or Gremlin but to match with relational databases, we use a concatenation of the usual parameters defined to connect to an instance supporting gremlin queries. You must provide the hostname, the port, the enableSsl, the username and password. You must also add a tip for NBi to identify the correct type of connection-string by adding an api tag with the value gremlin.If your database is a Azure Cosmos DB instance, then your username will be /dbs/{db}/colls/{coll} where {db} stands for the name of the database and {coll} for the name of the collection. Also, note that the hostname is a url part of the subdomain .gremlin.cosmosdb.azure.com.

hostname=https://gremlin-demo.gremlin.azure.com;
port=443;
enableSsl=true;
username=database=/dbs/Friends/colls/FoF;
password=F...==;
api=gremlin;

The same connection-string for a Tinkerpop Server, running on your computer, will be

hostname=localhost;
port=8182;
enableSsl=false;
username=database=Anything;
password=Anything;
api=gremlin;

Note that username and password are not expected by a Tinerpop Server. Anyway, they can’t be omited but any value will make it.

In your test-suite, querying a relational database with SQL or a Tinkerpop-enabled database with a gremlin query is not different. You’re still using the result-set and query elements. The gremlin query is specified within the query. It doesn’t support parameters at the moment but you’re still free to use template-tokens.

Gremlin queries can return complex results such as paths that are not easily mapped to a table (result-set). Currently, NBi support for Gremlin is limited to queries returning a list of Vertexes or a list of Edges or a list of values.

In this first test, NBi is asserting that the query to return Thomas’ friends of friends is effectively returning the correct result. In this case a unique vertex representing a person named Robin.

FoF-Thomas-Robin
<test name="Thomas' friends of friends" uid="0001">
    <system-under-test>
      <resultSet>
        <query>
          g.V().has('firstName', 'Thomas').outE('knows').inV().hasLabel('person').outE('knows').inV().hasLabel('person')
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <equalTo values-default-type="text">
        <resultSet>
          <row>
            <cell>robin</cell>
            <cell>person</cell>
            <cell>vertex</cell>
            <cell>Robin</cell>
            <cell>Wakefield</cell>
          </row>
        </resultSet>
      </equalTo>
    </assert>
  </test>

In this second test, we’re asserting that the all the vertexes contain a property age greater than 20.

<test name="All of them are older than 20 (or unknown)" uid="0002">
    <system-under-test>
      <resultSet>
        <query>
          g.V()
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <all-rows>
        <combination operator="or">
          <predicate operand="age">
            <null/>
          </predicate>
          <predicate operand="age">
            <more-than>20</more-than>
          </predicate>
        </combination>
      </all-rows>
    </assert>
  </test>

And finally, in this last test, we’re checking that the result-set returned by a projection& matches the expected result.

  <test name="Count of person's relationships" uid="0003">
    <system-under-test>
      <resultSet>
        <query>
          g.V().project('FirstName','KnowsCount').by('firstName').by(out().count())
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <equalTo>
        <resultSet>
          <row>
            <cell>Mary</cell>
            <cell>0</cell>
          </row>
          <row>
            <cell>Robin</cell>
            <cell>0</cell>
          </row>
          <row>
            <cell>Ben</cell>
            <cell>1</cell>
          </row>
          <row>
            <cell>Thomas</cell>
            <cell>2</cell>
          </row>
        </resultSet>
      </equalTo>
    </assert>
  </test>
FoF-running-test-suite

Our test-suite is green … time to write some additional tests!

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter or on the Github repository (section “issues”). The code of this extension is also available on Github.

Table.Distinct and case pitfalls with Power Query/M language

In a previous post, I already blogged about case pitfalls in Power Query/M language. By default, comparisons are case-sensitive and that could be an issue when sorting list or tables but also when using the List.Distinct and Table.Distinct functions.

To explain the issue, I had the following table:

distinct-case

Note that at the top of the table, I’ve twice the value ADO.NET, once with uppercase and once lowercase for the “Net” part. If I try to apply a Table.Distinct, the two values will be considered as distinct and one of them won’t be removed.

distinct-case-sensitive

Will it be an issue? If this your key and it’s part of a model where this key is part of one side of a one-to-many then it will be an issue.

duplicate-error-model

The question on the table is how to manage this case? The MSDN documentation informs us that a second parameter can be defined for Table.Distinct

Removes duplicate rows from a table, ensuring that all remaining rows are distinct.

Table.Distinct(
   table as table, 
   optional equationCriteria as any
   ) as table  

But no additional info about what’s the equationCriteria is available on this page. You’ll have more luck if you try to read the documentation for the function List.Distinct. There, you’ve a few samples and you can understand that you’ll be able to use any function but also the Comparer list of functions. The comparers are also documented in MSDN.

In that case, the most obvious choice is to the comparer named Comparer.OrdinalIgnoreCase to be sure that the comparison of the different items won’t be impacted by the difference of cases.

distinct-case-insensitive.

Handling Html entities when parsing web pages with Power Query/M Language

In a previous blog post, I explained how to handle the html entities when they were number-based such as &#192; or &#9827 ;. This post covers the other possibility to define an html entity, by name: &Agrave; or &clubs;.

Before going further, let’s notice that normally, entities defined by their name are case-sensitive. It means that &Agrave; resulting in “À” is different to &agrave; resulting in “à”. Our code will also support mistake in casing, if and only if this mistake will not result in an ambiguity. Based on that &AGRAVE; won’t be allowed but &CLUBS; will be.

The code is relatively straightforward, create a list of of all the known html entities and the corresponding values.

let
   list =
   {
       [Name="Agrave", Number=192]
       , [Name="Aacute", Number=193]
       , [Name="Acirc", Number=194]
       , [Name="Atilde", Number=195]
...

Then parse the entity’s name and look for the corresponding record in the list. The search must be executed with the specified casing, if nothing is found use a case-insensitive approach. If there is no match or more than one then raise an error.

        },
        token = if Text.StartsWith(entity, "&")
                   and Text.EndsWith(entity, ";")
                then substring
                else entity,
        substring = Text.Range(entity, 1, Text.Length(entity)-2),
        correctCasing = List.Select(
                 list
                 , each [Name]=token),
        anyCasing = List.Select(
                 list
                 , each Text.Upper([Name])=Text.Upper(token)),
        single =  try List.Single(correctCasing)[Number]
                  otherwise List.Single(anyCasing)[Number],
        handleError = Error.Record(
                       "Invalid parameter"
                       , "Unable to translate the html entity &" & token & ";"
                       , if List.Count(anyCasing)=0 then
                            "No match with the list of known entities."
                         else
                            "Multiple matches with the list of known entities. Check your the casing of your entity's name."),
        char = try Character.FromNumber(single)
               otherwise handleError
    in
        char;

This function is available as a gist at https://gist.github.com/Seddryck/36d24e05c233cccf90e6a88a1af39c68

Passing a function to another function to improve maintainability in Power Query/M Language

Developing is relatively easy but maintaining is harder. A code relatively easy to maintain must have a good balance between expressiveness and compactness. In Power Query/M Language, passing a function to another function, as a parameter, is helping to keep your code compact and to reuse large sets of code where the difference could be tiny. Have you already coded twice the same function one using a Text.Upper and another Text.Lower? Then this blog post could interest you. Surely, abusing this feature could result in an obfuscated code. Use it cautiously and do not abuse this feature.

To illustrate this, I’ll use the same use-case than in this blog post where I was trying to parse the results of my daughters at their gymnastics championship. One of the (many) difficulties in this parsing was to determine the firstName, lastname and team. These three informations were on the same line, separated by a space. In the most common pattern it looks such as:

CHARLIER Clémence A.R. GymBraine
CHARLIER Alice A.R. GymBraine

The pattern could have been

Lastname FirstName Team

But you’ll quickly find some exceptions and difficulties. First notable pitfall, you can’t determine how many words will compose the team’s name. It could be two (A.R. GymBraine) or three (ABC Gym Dinant) or four (Royale Garde Saint Jean) with optionally hyphens, quotes, dots and parentheses. Nothing reliable, you should take the hypothesis that everything after the first name is the team’s name.

Some people have a composed last name such as VAN DER PERRE. We can’t really rely on the hypothesis that the last name is just the first word. The pattern is more that the last name is fully capitalized.

For the first names, we’ve the same kind of issue, the first names Anne-laure and Lilly-shiva are composed but easily identifiable due to the no space between them. Unfortunately some first names are not inline with the previous assumption. I found a Marie antoinette and another Adina - ioana. Hopefully, you’ll notice than in both case only the first letter is an uppercase, all remaining letters are lowercase and the team’s name is always starting by an uppercase!

From that point the pattern is the following:

LAST NAME First name Team's name

The delimiter condition for the last name is that the next two characters are an uppercase followed by a lowercase (Fi in the case above). For the first name, the delimiter condition is a space followed by a uppercase (space + T in the example above).

Cooooool … we “just” have to code that.

First step, let’s create an helper function to check if a character is uppercase or lowercase. A naïve implementation could be

isUppercase = (c as text) => c= Text.Upper(c) 

But this code will return true for a space or an hyphen. It’s not what we’ll need, we need to strictly limit ourselves to letters. In a previous blog post, I already explained how we can use Character.FromNumber and Character.ToNumber to use the ASCII table and check if a character was a figure or not. We can apply the same approach with uppercase and lowercase.

isUppercase = (c as text) => 
  c = Character.ToNumber(c) >= Character.ToNumber("A") 
      and Character.ToNumber(c) <= Character.ToNumber("Z") 

I could do the same for lowercase replacing “A” by “a” and “Z” by “z” and create a new function. Sure, but it’s probably a good place to use for the first time the pattern of passing a function to another function. I’ll rewrite the previous function as:

isCase = (c as text, case as function) => 
  c = Character.ToNumber(c) >= Character.ToNumber(case("A")) 
      and Character.ToNumber(c) <= Character.ToNumber(case("Z")) 

Then, I could create two additional helpers:

isUpperCase = (c as text) => isCase(c, (x)=>Text.Upper(x)),
isLowerCase = (c as text) => isCase(c, (x)=>Text.Lower(x)),

Did you notice, how the function Text.Upper was passed as a parameter?

Now that we can test the casing of a character, we’ll need to tackle another challenge. We need to parse two consecutives characters in a string and find the first occurrence matching the corresponding delimiter condition.

Let’s start by building a list of the two consecutive characters. To do that we’ll use List.Generateand the not well-known fourth parameter the “selection”. This parameter let’s you define a function that will generate the member of the list in place of using the iterator.

let
   PositionOf = (message as text) as text => 
    let
       list = List.Generate(
            ()=>0,
            each _ < Text.Length(message)-1,
            each _+1, 
            each [
                i = _, 
                x = Text.Range(message, _, 1), 
                y = Text.Range(message, _+1, 1)]
        ),

This function is running from the first character of the string to the antepenultimate character and compose a record with respectively the position of the pair of characters in the string, the first character of the pair and the second one.

Then, we’ll need to select the first record matching a condition to get the position. The condition will change depending if we’re looking for the first name or the last name … so let it be a function!

       first = List.First(
                 List.Select(list, each condition(_))
               )[i], 
        word = Text.Start(message, first)
    in
        word
in
    PositionOf

The conditions for the first name and last name will be expressed as:

conditionLastName = (r as record) => 
    isUpperCase(r[x]) and isLowerCase(r[y]),
conditionFirstName = (r as record) => 
    r[x]=" " and isUpperCase(r[y]),

To test the previous code, we can implement the following test routine:

    list = {
              "ALPHA Beta L'équipe", 
              "ALPHA GOGO Beta Team's name", 
              "ALPHA GOGO Beta - gamma My 'team'"},
    temp = List.Transform(
              list,
              each [ 
                 full = _,
                 lastName = 
                    Text.Trim(PositionOf(_, conditionLastName)),
                 firstName = 
                    PositionOf(Text.Range(_, Text.Length(lastName)+1),
                               conditionFirstName),
                 team = Text.Range(_, 
                           Text.Length(firstName) +
                           Text.Length(lastName) + 1)
           ]),
    result = Table.ExpandRecordColumn(
                Table.FromList(temp, 
                    Splitter.SplitByNothing()),
                "Column1", 
                {"full","lastName","firstName","team"})
in
    result

And we’ll get the following result:

function-end-result

Writing a first data connector for Power BI

Since mid-2017, it’ possible to write your own custom data connectors. This feature offers the opportunity to encapsulate more difficult or technical code into a data connector and distribute it to business analysts. The key benefit is to let them concentrate on what they are supposed to do: business analysis … and not spend hours to put in place some plumbing code to get the data from your data source.

Currently, one of the supported use-case data connectors to brand a supported data source (ODBC) or enabling DirectQuery for an ODBC given data source. We won’t focus on that use-case in the next articles but in place focus on the idea to create a user-friendly experience over a REST API.

Before going further, you must download the Power Query SDK. This package is available as an extension of Visual Studio. It brings several benefits such as a powerful editor for the Power Query/M language with autocompletion and intellisense and limited features to debug your code (but at this moment nothing to monitor performances or no breakpoint). This SDK could be frustrating in some conditions but it’s still a version 1 and when something sounds clumsy in Visual Studio, just try it in Power BI to confirm the correct behavior of your code. There is room for improvement but already useful.

In this first article, I’ll create a basic data connector that will just return a list of numbers from 1 to 50. To achieve that, I’ll create a new project in Visual Studio, in the templates from the tab “Power Query”, I’ll select the “Data Connector Project” and click on “OK”.

vs-select-project-power-query

Visual Studio will create a boilerplate project with a file having an extension .pq another query.pq but also a list of image files and a resources file. The file with the extension “.pq” will contain your code for the connector (at this moment it’s not possible to split your code in several files). The file “.query.pq” will contains the query that you want to run in debug mode (it could reference the functions defined in your “.pq” files).

vs-power-query-solution

If you open the “.pq” file, you’ll find an “hello world” function named PQExtension1.Contents.

vs-power-query-default

You can replace this function by the code to create a serie from 1 to 50:

source = {1..50}
vs-power-query-code

A few important remarks about this function. If you’re used to develop custom functions with the Power Query/M language, you’ll notice that I explicitly set the empty parenthesis to define that this function was not expecting parameters. Something that you could usually skip when writing when writing functions in Power BI. In fact, these empty parentheses imply a lot of consequences but I won’t deep dive into the Power Query/M language at this moment and just remember that functions that you expose through a Data Connector require these empty parenthesis (or with some parameters if you need them).

Second key remark, is the semi-column (“;”) at the end of my function. This is the case for all functions that you’ll develop in the “.pq” files.

If you click on the start button or hit F5, Visual Studio will start your Data connector in debug mode. This first start will just give you the opportunity to specify some credentials. Credentials are managed by DataSource.Kind. Just set the Credential Type to anonymous for this sample and click on “Set Credential”.

vs-power-query-credentials

This screen should only display on your first debugging session, if you don’t change your DataSource.Kind. When Credentials have been specified, click again on the start button, to see the result of your data connector within Visual Studio:

vs-power-query-output

Now that the connector is working within Visual Studio you can deploy it to Power BI. To achieve that task, you just need to copy the result of the build executed by Visual Studio and paste it in the folder %userprofile%\documents\Microsoft Power BI Desktop\Custom Connectors\.
This bat file is just executing this action and starting “Power BI”:

CD /d %~dp0
SET destination="%userprofile%\documents\Microsoft Power BI Desktop\Custom Connectors\"
XCOPY "PQExtension1\bin\debug\*.mez" %destination% /Y /F
"C:\Program Files\Microsoft Power BI Desktop\bin\PBIDesktop.exe"
PAUSE

Once Power BI has started, click on “Get Data” and select “Other”, you’ll find your data connector in this list. Select it and click on OK.

vs-power-query-get-data

It will open the “Power Query editor” and display the result. From this point you can continue to work on your data as usual and add steps if needed.

vs-power-query-get-data-pq-editor

You can also open the “Advanced editor” and check the code that has been generated when you selected the data connector.

vs-power-query-advanced-editor

The code is just calling the function that you created in your connector, hiding all the complex code to the business analyst. Great!

In the next articles of this serie, I’ll go further and get data from a REST API and propose a great user interface to the business analyst.

Converting datetime from UTC to Local time with Power Query M language

This article explains how to convert from UTC to Local time with the Power Query M Language. It sounds easy but it’s not so trivial.

Power Query M Language is supporting a datetimezone type and expose a function DateTimeZone.ToLocal. These artefacts are doing the job and you can quickly create the following function, converting a UTC datetime to a local datetime:

let
  UtcToLocal = (utc as datetime) =>
    let
      utcToLocal = 
        DateTimeZone.RemoveZone(
          DateTimeZone.ToLocal(
            DateTime.AddZone(utc,0)
          )
        )
     in
       utcToLocal
in
    UtcToLocal

I’m from Belgium (UTC+1 during winter time and UTC+2 during summer time) and the following conversions are just fine:

UtcToLocal(#datetime(2018, 3, 21, 17, 0, 0)) = 
  #datetime(2018, 3, 21, 18, 0, 0)   #Winter time => +1
UtcToLocal(#datetime(2018, 3, 30, 17, 0, 0)) = 
  #datetime(2018, 3, 21, 19, 0, 0)   #Summer time => +2

When I try to convert the datetime around the switch from summer time to winter time, I’ve twice the local time 02:00:00, once at 00:00:00UTC and once at 01:00:00UTC. That’s indeed the case.

UtcToLocal(#datetime(2017, 10, 29, 0, 0, 0)) = 
  #datetime(2017, 10, 29, 2, 0, 0)   #Summer time => +2
UtcToLocal(#datetime(2017, 10, 29, 1, 0, 0)) = 
  #datetime(2017, 10, 29, 2, 0, 0)   #Winter time => +1

Everything is fine … except if I share my code with someone from another time zone. The function DateTimeZone.ToLocal is relying on regional settings and in that case my conversion should always be from UTC to “Brussels time”.

I didn’t find any other way to ensure that I’m always converting from UTC to “Brussels time” than implementing the conversion by myself. That’s the goal of the following function:

UtcToLocalTime = (date as date, time as time) =>

In Europe, the two pivotal daylight savings dates are the last Sundays of March and October. Translated in M language, it means the first day of the week, assuming the week is starting a Sunday, containing the last day of this month (which is a 31).

lastSundayOfOctober = 
  Date.StartOfWeek(#date(Date.Year(date), 10, 31), Day.Sunday),
lastSundayOfMarch = 
  Date.StartOfWeek(#date(Date.Year(date), 3, 31), Day.Sunday),

Based on the two pivotal dates, I can check if I’m in winter time or summer time

isSummerTime = 
   (date > lastSundayOfMarch and date < lastSundayOfOctober) 
   or (date = lastSundayOfOctober and time  #time(1,0,0)),

And based on that adjust the time zone

timeZone = 1 + Number.From(isSummerTime),
localTime = 
  DateTime.From(date) 
  + #duration(0,Time.Hour(time),Time.Minute(time),Time.Second(time))  
  + #duration(0, timeZone, 0, 0),

I can also check if the specified date and time corresponds to the doubled-hour of the last Sunday of October and return a symbol to that date (in a text format)

isWinterDouble = 
   (date = lastSundayOfOctober) and time = #time(1,0,0),
localString = 
   DateTime.ToText(localTime, "yyyy-MM-dd HH:mm:ss") 
   & Text.Repeat("*", Number.From(isWinterDouble)),

The final result is a record with the local date, time and textual representation:

    record = 
        [
            LocalDate = Date.From(localTime), 
            LocalTime = Time.From(localTime), 
            LocalDateTime = localTime, 
            LocalString = localString
        ]
in
    record;

This function is returning the expected result:

UtcToLocal