Elasticsearch

Automating the testing of Elasticsearch queries with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. In 2018, I’m adding support for NoSQL databases. Earlier this year, I released some extensions for the Graph databases but This new extension is targetting Elasticsearch RESTful search and analytics engine. The code for this extension to the NBi framework is hosted on GitHub.

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI and data centric solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some queries on an Elasticsearch instance, you must download the version 1.18 of NBi and its extension for Elasticsearch in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.Elasticsearch (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project files as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I started an elasticsearch instance and run the sample data from the Bank example.

curl -H "Content-Type: application/json" -XPOST "localhost:9200/bank/_doc/_bulk?pretty&refresh" --data-binary "@accounts.json"

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.Elasticsearch.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.Elasticsearch"/>
    </extensions>
  </nbi>
</configuration>

The concept of connection-string is not familiar to Elasticsearch but to match with relational databases, we use a concatenation of the usual parameters defined to connect to a Elasticsearch instance with the low level client available in C#. You must provide an url with the hostname, the port, the username and password. The protocol should be elasticsearch at the moment to make a distinction with other databases relying on the http protocol.

elasticsearch://user:password@localhost:9200

In your test-suite, querying a relational database with SQL or an Elasticsearch cluster with a query written in the Query DSL is not different. You’re still using the result-set and query elements. The query is specified within the query.

Elasticsearch queries can return complex results such as hits, aggregations or counts. Currently, NBi support for Elasticsearch is limited to queries returning a hits or aggregations.

This really first test is asserting that the query dedicated to search the 5 persons with the higher balance will effectively returns 5 rows.

<test name="5 highest balances returns 5 rows" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
        GET bank/_search
        {
           "query": {"match_all": { }}
           , "size": 5
           , "_source": ["gender", "age", "balance"]
           , "sort": [ { "balance" : {"order" : "desc"}}]
        }
        ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <row-count>
      <equal>5</equal>
    </row-count>
  </assert>
</test>

In the second test, the same query than in previous example should return rows with an age between 20 and 40 and a balance greater than 48.000$

<test name="5 highest balance" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
        GET bank/_search
        {
           "query": {"match_all": { }}
           , "size": 5
           , "_source": ["gender", "age", "balance"]
           , "sort": [ { "balance" : {"order" : "desc"}}]
        }
        ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <all-rows>
      <combination operator="and">
        <predicate operand="age">
          <within-range>[20;40]</within-range>
        </predicate>
        <predicate operand="balance">
          <more-than>48000</more-than>
        </predicate>
      </combination>
    </all-rows>
  </assert>
</test>

The next test is checking the result of an aggregation. More specifically than the the three states with the higher average for the balance are Washington, Alabama and Rhode Island but also that the count of documents and the average balance are more or less in the expected range.

<test name="Top 3 of average balance by state" uid="0002">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
        GET /bank/_search
        {
          "size": 0,
          "aggs": {
            "group_by_state": {
              "terms": {
                "field": "state.keyword",
                "size" : 3,
                "order": {
                  "average_balance": "desc"
                }
              },
              "aggs": {
                "average_balance": {
                  "avg": {
                    "field": "balance"
                  }
                }
              }
            }
          }
        }
        ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo tolerance="1">
      <resultSet>
        <row>
          <cell column-name="key">WA</cell>
          <cell column-name="doc_count">(+)</cell>
          <cell column-name="average_balance">43265</cell>
        </row>
        <row>
          <cell>AL</cell>
          <cell>6</cell>
          <cell>41418</cell>
        </row>
        <row>
          <cell>RI</cell>
          <cell>[5;10]</cell>
          <cell>40040</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>
FoF-running-test-suite

Our test-suite is green … time to write some additional tests! Feel free to explore the other kind of tests that NBi is supporting and report your ideas for improvement at the <a href="http://The ” target=”_blank”>GitHub issues for NBi.Elasticsearch

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter or on the Github repository (section “issues”). The code of this extension is also available on Github.

Advertisements