Month: February 2016

Get data from Neo4j with Power BI

Power BI offers out-of-the-box connectivity for many data sources but unfortunately Neo4j is not (yet?) in this list. To be able connect to this graph database, you’ll need to use the Transactional Cypher HTTP endpoint. This API from Neo4j lets you submit some REST POST requests (your queries) to the server and get some responses (the return of your queries).

To send a REST POST request from Power BI, you can use the M language and the function Web.Contents. The first parameter is the url: from the documentation of transactional Cypher HTTP endpoint you can easily find out the value (depending of your server’s configuration).

let
    Source = Web.Contents(
        "http://localhost:7474/db/data/transaction/commit"
    )

This API is expecting that you submit your query through a POST. To achieve this with Power BI, you’ll need to supply a second named parameter Content to the method Web.Contents. The value of this parameter will be your cypher query surrounded by a little bit of JSON syntax corresponding to the Neo4j documentation:

{
  ""statements"" : [ {
    ""statement"" :
      ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""
  } ]
}

Pay attention that this JSON request will need to be binary encoded by Power BI before the submission to the Neo4j server.

let
    Source = Web.Contents( "http://localhost:7474/db/data/transaction/commit")
      [Content=Text.ToBinary("{
         ""statements"" : [ {
           ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
           }")
      ]
    )

If you try this code, Power BI and Neo4j will inform you that your user is not authorized and prompt you to edit the credentials.The user was not authorized

If you provide valid credentials,

Neo4j credentials

Power BI will complain about the usage of Content for a non-anonymous connection.

Web Contents only for anonymous

At the moment, it sounds impossible to submit POST requests with authentication from Power BI. The only way to submit your query is to allow your Neo4j server to authorize anonymous access. To configure your server edit the neo4j-server.propertiesfile.

edit server config.png

and change the value of the parameter dbms.security.auth_enabled from trueto false. This parameter is usually at the top of the file. Be sure that you understand the consequences of his action before going further and don’t forget to restart your Neo4j server to apply the new configuration.

In order to use an anonymous connection to Neo4j, you’ll need to remove the authentication information from Power BI Desktop data source settings. Just click on File > Options and settings > Data source settings, then select the url corresponding to your Neo4j server and click on the delete button.

Remove connection settings

If you try again, the same code should return a JSON document. To explain to Power BI that this document should be imported as JSON, click on Open file as and select the  JSON option.

Open file as JSON.png

Then click on the list corresponding to the  results. This list has exactly one record, just click on it and select the child list corresponding to data.  You will obtain a list of records that you convert to a table by selecting Convert to table.

convert-to-table

When done, you’ll need to expend the column to find … another set of lists. Each list contains the two expected values (the name of an actor and his rank). To convert this list to two columns, your best friend is the function Record.FromList

Table.AddColumn(
  #"Expanded Column1"
  , "Custom"
  , each Record.FromList(
    [row]
    , type [Name = text, Rank = number]
  )
)

Expend the record and remove the column containing the initial list  … and tadam you can now easily use the result of your query!

end-result

Now, you can play with the cool visualizations of Power BI but you can also merge the result of this query with another table or anything else that is possible with Power BI.

neo4j-power-bi.png

Full code:

let
  Source = Web.Contents( "http://localhost:7474/db/data/transaction/commit",
    [Content=Text.ToBinary("{
      ""statements"" : [ {
        ""statement"" : ""MATCH (tom:Person {name:'Tom Hanks'})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors), (coActors)-[:ACTED_IN]->(m2)<-[:ACTED_IN]-(cocoActors) WHERE NOT (tom)-[:ACTED_IN]->(m2) RETURN cocoActors.name AS Recommended, count(*) AS Strength ORDER BY Strength DESC""} ]
    }")]
  ),
  #"Imported JSON" = Json.Document(Source),
    results = #"Imported JSON"[results],
    results1 = results{0},
    data = results1[data],
  #"Converted to Table" = Table.FromList(
    data
    , Splitter.SplitByNothing()
    , null
    , null
    , ExtraValues.Error
  ),
  #"Expanded Column1" = Table.ExpandRecordColumn(
    #"Converted to Table"
    , "Column1"
    , {"row"}, {"row"}
  ),
  #"x"=Table.AddColumn(
    #"Expanded Column1"
    , "Custom"
    , each Record.FromList([row], type [Name = text, Rank = number])
  ),
  #"Expanded Custom" = Table.ExpandRecordColumn(
    x
    , "Custom"
    , {"Name", "Rank"}, {"Name", "Rank"}
  ),
  #"Removed Columns" = Table.RemoveColumns(
    #"Expanded Custom"
    ,{"row"}
  )
in
  #"Removed Columns"
Advertisements

How to log the queries received by a cube?

In a previous post, I explained how to analyze which users connect to your cube, this blog post will focus on logging their queries. The strategy, here under, has a small footprint on your server by using traces. Use it to debug some weird situations or to monitor during a brief instant but if you want to use this strategy for long periods, be careful and monitor the impact on your server’s performances.

average dba and queries

The way to achieve this is the similar to the method for auditing users’ connections. You create a trace with Extended Events, you redirect the flow of events to a flat file and you read this file with the help of the function fn_xe_file_target_read.

Let’s start by creating the trace. The interesting event is Query End (code = QueryEnd). This event comes with its children: notably, the query itself (TextData), the start and end time (UTC format) and the NTUserName. For the last one, I’ve noticed that the information is not necessarily filled. Without confirmation, I’d say that this field is only available when the event AuditLogin is also intercepted by a trace.

<Create
   xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
   xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
   xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
   xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
   xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300">
   <ObjectDefinition>
     <Trace>
       <ID>SQLAdmin_Query</ID>
       <Name>SQLAdmin_Query</Name>
       <AutoRestart>true</AutoRestart>

       <ddl300_300:XEvent>
         <event_session name="xeas"            
            dispatchLatency="1"            
            maxEventSize="4"            
            maxMemory="4"            
            memoryPartitionMode="none"            
            eventRetentionMode="allowSingleEventLoss"            
            trackCausality="true"
         >
         <event package="AS" name="QueryEnd" />
         <target package="Package0" name="event_file">
           <parameter name="filename" value="L:\MSSQL_AS_Log\SQLAdmin_Query.xel" />
           <parameter name="max_file_size" value="1024" />
           <parameter name="max_rollover_files" value="3" />
         </target>
       </event_session>
     </ddl300_300:XEvent>
     </Trace>
   </ObjectDefinition>
 </Create>

If you’re looking for queries executing in more than 60 seconds, you should define the following query:

with cte as
(
select
	event_data.value(
		'(event/data[@name="StartTime"])[1]'
		, 'varchar(255)'
	) as StartTime
	, event_data.value(
		'(event/data[@name="EndTime"])[1]'
		, 'varchar(255)'
	) as EndTime
	, event_data.value(
		'(event/data[@name="TextData"])[1]'
		, 'varchar(4000)'
	) as TextData
	, event_data.value(
		'(event/data[@name="NTUserName"])[1]'
		, 'varchar(255)'
	) as NTUserName
	, event_data
from
(
select
   cast(event_data as xml) as event_data
from
   sys.fn_xe_file_target_read_file(
      'L:\MSOLAP\SQLAdmin_Query_*.xel'
       , null, null, null
   )
where
   object_name='QueryEnd'
) t
)
select
	StartTime
	, EndTime
	, datediff(SECOND, StartTime, EndTime) as Duration
	, TextData
	, NTUserName
	, event_data

from
	cte
where
	datediff(SECOND, StartTime, EndTime)>60

How to find out who is connected to my cube

A few days ago, to prepare a potentially difficult meeting with some internal clients, I needed more insights about the users of one the cube … In other words: who is connected and do they connect daily or weekly (or less frequently).

students group raise hands up in classroom

You need another solution to know who is connected to your cube!

First issue: some people are consuming data from this cube through reports or applications. These reports and applications use impersonation of a generic account to query the cube. Hopefully, these kind of connections were out-of-scope for my study. I just wanted to intercept the connections from Excel or SSMS and they are not impersonated.

You’ve many solutions to intercept the different connections to a cubes, but one of the most elegant (and not well-known) ways is with the Extended Events of SSAS. You’ll need to request Analysis Service to intercept one or more events and redirect them to a target.

The definition of the trace is not so complex and you can easily code it with an XMLA command if you’ve a template. The first point is to give an identification to the trace and specify that this trace must be restarted in case of restart of the SSAS service (attribute AutoRestart set to true).

<Create>
   <ObjectDefinition>
      <Trace>
         <ID>SQLAdmin_Audit</ID>
         <Name>SQLAdmin_Audit</Name>
         <AutoRestart>true</AutoRestart>
         
      </Trace>
   </ObjectDefinition>
</Create>

In order to limit the effect of this trace on server performances and to be able to read the content a few days later, I decided to persist the trace on a file. I configured the trace’s destination on a different disk that my cube (attribute filename). To avoid any disk space issue, I configured a roll-over files strategy and limit them to 3 occurrences (parameter named max_rollover_files) of 1Gb (parameter max_file_size expressed in Mb) in the xml element target.

<ddl300_300:XEvent>
   <event_session>
      <target package="Package0" name="event_file">
         <parameter name="filename" value="L:\MSOLAP\SQLAdmin_Audit.xel" />
         <parameter name="max_file_size" value="1024" />
         <parameter name="max_rollover_files" value="3" />
      </target>
   </event_session>
</ddl300_300:XEvent>

The you’ll need to define which events are interesting. In this case, I decided to intercept the event Audit Login ([/code]code = AuditLogin[/code]). This event is raised for all new connection, such as when a client requests a connection to the server. This server is dedicated to the cube that I needed to monitor so no additional filter will be needed.

 <ddl300_300:XEvent>
   <event_session ...>
      <event package="AS" name="AuditLogin" />
   </event_session>
 </ddl300_300:XEvent>

The full creation script is available here under:

<Create
   xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
   xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
   xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
   xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
   xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300">
   <ObjectDefinition>
      <Trace>
         <ID>SQLAdmin_Audit</ID>
         <Name>SQLAdmin_Audit</Name>
         <AutoRestart>true</AutoRestart>
         <ddl300_300:XEvent>
            <event_session name="xeas"
              dispatchLatency="1"
              maxEventSize="4"
              maxMemory="4"
              memoryPartitionMode="none"
              eventRetentionMode="allowSingleEventLoss"
              trackCausality="true">
               <event package="AS" name="AuditLogin" />
               <target package="Package0" name="event_file">
                  <parameter name="filename" value="L:\MSOLAP\SQLAdmin_Audit.xel" />
                  <parameter name="max_file_size" value="1024" />
                  <parameter name="max_rollover_files" value="3" />
               </target>
            </event_session>
         </ddl300_300:XEvent>
      </Trace>
   </ObjectDefinition>
</Create>

To be sure that your trace is running you can use the following script

<Discover xmlns="urn:schemas-microsoft-com:xml-analysis">
   <RequestType>DISCOVER_TRACES</RequestType>
   <Restrictions/>
   <Properties/>
</Discover>

You can also check on your file location (L:\MSOLAP), if a file starting SQLAdmin_Audit and ending by .xel is available. Don’t panic with the size of your file or tha last update timestamp, some events could be delayed and the timestamp is not always updated.

And how can I query this file? With SQL Server database engine! The key function to read this kind of file is named sys.fn_xe_file_target_read. This function takes care of rollover files for you and that’s really cool. First parameter is the filename and others re not needed. If you’ve more events than just the AuditLogin, you can easily filter on the field object_name

select 
   cast(event_data as xml) as event_data
from
   sys.fn_xe_file_target_read_file(
      'L:\MSOLAP\SQLAdmin_Audit_*.xel'
      , null, null, null
   )
where
   object_name='AuditLogin'
)

Oups, this function returns some xml elements. SQL Server has features to query xml and extract specific fragments. For this case, the usage of value will be a good approach. This method requires and XML path followed by and an additional parameter (just let the [1]). The second parameter expected by this function is just the SQL return type (between simple quotes). With the help of this function, I’m able to extract the NTUserName and the StartTime of the connection, just after I’m filtering on distinct users to have my list.

with cte as
(
select
   event_data.value(
      '(event/data[@name="NTUserName"])[1]'
      , 'varchar(255)'
   ) as    NTUserName
   , event_data.value(
      '(event/data[@name="StartTime"])[1]'
      , 'varchar(255)'
   ) as StartTime
   , event_data
from
(
   select 
      cast(event_data as xml) as event_data
   from
      sys.fn_xe_file_target_read_file(
         'L:\MSOLAP\SQLAdmin_Audit_*.xel'
         , null, null, null
      )
   where
      object_name='AuditLogin'
)
) t
select distinct
   NTUserName
from
   cte

With this first step I’m able to identify the users and if they were frequent users or not. In a next blog post, I will explain how to intercept the queries. But before, it’s perhaps useful to remove the trace from your server if it’s not useful at the moment:

<Delete
   xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
   xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
   xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
   xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
   xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300">
   <Object>
      <TraceID>SQLAdmin_Audit</TraceID>
   </Object>
</Delete>