Neo4j

Automating the testing of Neo4j cypher queries with NBi

Since five years, you can run automated tests on your SQL server databases and SSAS cubes with the help of the open-source framework NBi. This framework was extended to support other relational databases (using the OleDb or ODBC drivers) in 2017. In 2018, I’m adding support for graph databases. Earlier this year, I released an extension for the Azure Cosmos DB API and notably the Graph API and another for Tinkerpop-enabled databases supporting Gremlin queries. This new extension is supporting the Neo4j databases and the Cypher query language. The code for this extension is hosted on GitHub.

If you’re used to NBi, you can directly jump to the next paragraph. For newcomers, NBi is an open-source framework dedicated to the testing of BI solutions: it supports complex comparison of queries results, asserting conditions on each row of a result-set, running ETL (SSIS), validating queries from reports (SSRS) or many other possibilities. On top of this, it has an advanced feature to automate the process of creating test-suites.

In order to setup an environment for testing some cypher queries, you must download the version 1.18 of NBi and its extension for Gremlin in version 1.0. Unzip the content of the NBi download and then partially override it with the content of the extension in the NBi.Neo4j (dlls from both packages must be deployed in a unique directory). If NUnit 2.6.4 is not available on your computer don’t forget to download it. If you need more info about how to setup a NBi’s extension, read the documentation.

When NBi and its extension are installed, create your test-suite with its config and NUnit project file as you’d normally do it for any NBi test-suite (or, if you’re in the hurry, download the files with the examples here, but don’t forget to update the connection-string when trying to run it). If you need more info about how to setup a test-suite, read the here.

For this example, I started a Neo4j instance and created some vertexes and edges based on the classical movies database shipped with Neo4j

:play movies graph

When the environment is correctly configured, you must edit your config file to reference the extension NBi.Core.Neo4j.

<configuration>
  <configSections>
    <section name="nbi" type="NBi.NUnit.Runtime.NBiSection, NBi.NUnit.Runtime"/>
  </configSections>
  <nbi testSuite="...">
    <extensions>
      <add assembly="NBi.Core.Neo4j"/>
    </extensions>
  </nbi>
</configuration>

The concept of connection-string is not familiar to Neo4j but to match with relational databases, we use a concatenation of the usual parameters defined to connect to a Neo4j instance with the bolt driver. You must provide an url with the hostname, the port, the username and password.

bolt://user:password@localhost:7474

In your test-suite, querying a relational database with SQL or a Neo4j instance with a cypher query is not different. You’re still using the result-set and query elements. The cypher query is specified within the query. It supports parameters and template-tokens.

Cypher queries can return complex results such as paths that are not easily mapped to a table (result-set). Currently, NBi support for Cypher is limited to queries returning a list of values.

In this test, NBi is asserting that the query to return Tom Hanks’ movies, released after 2000, is effectively what we’re expecting.

<test name="Tom Hanks'movies after 2000" uid="0001">
  <system-under-test>
    <resultSet>
      <query>
        MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies)
        WHERE tomHanksMovies.released>2000
        RETURN tomHanksMovies.title, tomHanksMovies.released
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <equalTo>
      <resultSet>
        <row>
          <cell>Charlie Wilson's War</cell>
          <cell>2007</cell>
        </row>
        <row>
          <cell>The Polar Express</cell>
          <cell>2004</cell>
        </row>
        <row>
          <cell>The Da Vinci Code</cell>
          <cell>2006</cell>
        </row>
        <row>
          <cell>Cloud Atlas</cell>
          <cell>2012</cell>
        </row>
      </resultSet>
    </equalTo>
  </assert>
</test>
FoF-running-test-suite

Our test-suite is green … time to write some additional tests!

To know more about NBi, check the website at www.nbi.io and if you’ve any question or suggestion, feel free to ask on Twitter or on the Github repository (section “issues”). The code of this extension is also available on Github.

Advertisements

Get data from Neo4j with Power BI

Power BI offers out-of-the-box connectivity for many data sources but unfortunately Neo4j is not (yet?) in this list. To be able connect to this graph database, you’ll need to use the Transactional Cypher HTTP endpoint. This API from Neo4j lets you submit some REST POST requests (your queries) to the server and get some responses (the return of your queries).

To send a REST POST request from Power BI, you can use the M language and the function Web.Contents. The first parameter is the url: from the documentation of transactional Cypher HTTP endpoint you can easily find out the value (depending of your server’s configuration).

let
    Source = Web.Contents(
        "http://localhost:7474/db/data/transaction/commit"
    )

This API is expecting that you submit your query through a POST. To achieve this with Power BI, you’ll need to supply a second named parameter Content to the method Web.Contents. The value of this parameter will be your cypher query surrounded by a little bit of JSON syntax corresponding to the Neo4j documentation:

{
  ""statements"" : [ {
    ""statement"" :
      ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""
  } ]
}

Pay attention that this JSON request will need to be binary encoded by Power BI before the submission to the Neo4j server.

let
    Source = Web.Contents( "http://localhost:7474/db/data/transaction/commit")
      [Content=Text.ToBinary("{
         ""statements"" : [ {
           ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
           }")
      ]
    )

If you try this code, Power BI and Neo4j will inform you that your user is not authorized and prompt you to edit the credentials.The user was not authorized

If you provide valid credentials,

Neo4j credentials

Power BI will complain about the usage of Content for a non-anonymous connection.

Web Contents only for anonymous

At the moment, it sounds impossible to submit POST requests with authentication from Power BI. The only way to submit your query is to allow your Neo4j server to authorize anonymous access. To configure your server edit the neo4j-server.propertiesfile.

edit server config.png

and change the value of the parameter dbms.security.auth_enabled from trueto false. This parameter is usually at the top of the file. Be sure that you understand the consequences of his action before going further and don’t forget to restart your Neo4j server to apply the new configuration.

In order to use an anonymous connection to Neo4j, you’ll need to remove the authentication information from Power BI Desktop data source settings. Just click on File > Options and settings > Data source settings, then select the url corresponding to your Neo4j server and click on the delete button.

Remove connection settings

If you try again, the same code should return a JSON document. To explain to Power BI that this document should be imported as JSON, click on Open file as and select the  JSON option.

Open file as JSON.png

Then click on the list corresponding to the  results. This list has exactly one record, just click on it and select the child list corresponding to data.  You will obtain a list of records that you convert to a table by selecting Convert to table.

convert-to-table

When done, you’ll need to expend the column to find … another set of lists. Each list contains the two expected values (the name of an actor and his rank). To convert this list to two columns, your best friend is the function Record.FromList

Table.AddColumn(
  #"Expanded Column1"
  , "Custom"
  , each Record.FromList(
    [row]
    , type [Name = text, Rank = number]
  )
)

Expend the record and remove the column containing the initial list  … and tadam you can now easily use the result of your query!

end-result

Now, you can play with the cool visualizations of Power BI but you can also merge the result of this query with another table or anything else that is possible with Power BI.

neo4j-power-bi.png

Full code:

let
  Source = Web.Contents( "http://localhost:7474/db/data/transaction/commit",
    [Content=Text.ToBinary("{
      ""statements"" : [ {
        ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
    }")]
  ),
  #"Imported JSON" = Json.Document(Source),
    results = #"Imported JSON"[results],
    results1 = results{0},
    data = results1[data],
  #"Converted to Table" = Table.FromList(
    data
    , Splitter.SplitByNothing()
    , null
    , null
    , ExtraValues.Error
  ),
  #"Expanded Column1" = Table.ExpandRecordColumn(
    #"Converted to Table"
    , "Column1"
    , {"row"}, {"row"}
  ),
  #"x"=Table.AddColumn(
    #"Expanded Column1"
    , "Custom"
    , each Record.FromList([row], type [Name = text, Rank = number])
  ),
  #"Expanded Custom" = Table.ExpandRecordColumn(
    x
    , "Custom"
    , {"Name", "Rank"}, {"Name", "Rank"}
  ),
  #"Removed Columns" = Table.RemoveColumns(
    #"Expanded Custom"
    ,{"row"}
  )
in
  #"Removed Columns"