Microsoft BI

Using NBi to assert calculations based on measures

A few days ago on GitHub, I received a request to explain how to assert the results of some calculations with measures and also how to automate these tests with NBi. This blog post is the complete answer to this question.

To illustrate the case, I’ll use the Adventure Works 2012 multidimensional cube, but you can do the same with any other database. In this cube, you’ve a measure-group with the measures: Internet Gross Profit, Internet Sales Amount, Internet Gross Profit Margin. As you can guess, the last measure is the result of the division of the two firsts (expressed in percentage). My test will assert that this is still the case when I’m slicing by customers’ country.

To achieve this, I’m defining a system-under-test with a result-set defined in a MDX query:

<system-under-test>
  <resultSet>
    <query>
      <![CDATA[
      select
      {
        [Measures].[Internet Gross Profit],
        [Measures].[Internet Sales Amount],
        [Measures].[Internet Gross Profit Margin]   
      } on 0,
      {
        [Customer].[Country].members
      } on 1
      from
        [Adventure Works]
      ]]>
    </query>
  </resultSet>
</system-under-test>

Asserting the calculations’ result can be executed by a all-rows and a predicate where you’ll check that the content of the last measure is equal to the quotient of the two firsts.

Due to the complex names of columns for MDX queries, it’s recommended to use aliases. With NBi, we can define an alias based on columns’ position (starting at 0). The position is 0 will be for the column [Customer].[Country].members, we can skip it. But columns 1 to 3 will receive aliases:

<alias column-index="1">Profit</alias>
<alias column-index="2">Sales</alias>
<alias column-index="3">Margin</alias>

The test will need to compute the quotient between Profit and Sales and check that this value is equal to the value returned by the cube in the column Margin. This can be expressed in the following expression:

<expression name="calculate">
    Round(Profit/Sales, 4) = Round(Margin, 4)
</expression>

The Round function is provided by the underlying library for the evaluation of these expressions: NCalc. In this case, it’s extremely useful to avoid rounding issues during computations. Due to the percentage, I didn’t round to 2 digits after the decimal separator but to 4.

Based on the expression defined above, we can use the predicate true. Each row not validating the equality defined above will return false and not validate the predicate. The test is a all-rows so if any row is not validating the predicate then the test will fail.

Some developers will perhaps try to use the predicate equal in place of true, but unfortunately, at this moment, NBi can only handle fixed value for the equal predicate and it’s not possible to specify a column This limitation is invalidating the approach with equal.

The full code for this test is

<test name="Profit, Sales and Margin">
  <system-under-test>
    <resultSet>
      <query>
        <![CDATA[
      select
      {
          [Measures].[Internet Gross Profit],
          [Measures].[Internet Sales Amount],
          [Measures].[Internet Gross Profit Margin] 
      } on 0,
      {
          [Customer].[Country].members
      } on 1
      from
          [Adventure Works]
      ]]>
      </query>
    </resultSet>
  </system-under-test>
  <assert>
    <all-rows>
      <alias column-index="1">Profit</alias>
      <alias column-index="2">Sales</alias>
      <alias column-index="3">Margin</alias>
      <expression name="calculate">
          Round(Profit/Sales, 4) = Round(Margin, 4)
      </expression>
      <predicate operand="calculate" type="boolean">
        <true />
      </predicate>
    </all-rows>
  </assert>
</test>

The second part of this blog post is about the automation of generation of this kind of tests. Let’s start by defining our test-cases. To define them, I’ll put the aliases that I’d like to use (column alias), the expected formulae (column expression), the exact name of the measure (column measure) and how many digits are needed for rounded comparisons (column round). In this example, I created two test-cases. The first test-case will check that Profit is equal to Sales minus Cost and the last one is asserting that Margin is equal to Profit divided by Sales.

alias;expression;measure;round
Sales,Cost,Profit;Sales-Cost;[Measures].[Internet Sales Amount],[Measures].[Internet Total Product Cost],[Measures].[Internet Gross Profit];2
Profit,Sales,Margin;Profit/Sales;[Measures].[Internet Gross Profit],[Measures].[Internet Sales Amount],[Measures].[Internet Gross Profit Margin];4

To load this CSV file in genbiL, I’ll need to use the load file function:

case load file 'calculations.csv';

Note that in this CSV file each column is separated by a semi-column (;). In the columns alias and measure, I’ve more than one value and each of these values are separated by a coma (,).

To parse this CSV file with genbiL, I’ll need to use the function split. This function is transforming a single value into an array of values using a separator.

case split column 'alias' with value ',';
case split column 'measure' with value ',';

After these two lines of code, the test-cases’ scope consists in two cases and each of them has an array of three elements in the columns alias and measure.

Next steps are straightforward: loading the template (that we’ll compose just after this), generate the tests and save them.

template load file 'calculations.nbitt';
suite generate;
suite save 'calculations.nbits';

Once we’ve the CSV file and the genbiL script, we still need to build a template. There are two tricky parts in this template. The first one is to generate the list of measures in the MDX queries. Elements of this list must be separated by a coma. If you’ve three elements, you’ll need two comas.

select
   {
      [Measures].[Internet Gross Profit],
      [Measures].[Internet Sales Amount],
      [Measures].[Internet Gross Profit Margin]
   } on 0,

To achieve that with the StringTemplate engine, you’ll need to use the separator keyword (doc) for any variable containing an array.

select
   {
      $Measure; separator=", 
"$
   } on 0,

The second tricky part sounds similar but is really different. defining the list of aliases. In this case there is no separator, if you’ve three elements, you’ll need exactly the same xml elements three times. An additional detail is that we’ll need to add some figures starting at 1 and increasing on each new element. Hopefully StringTemplate has the concept of anonymous templates that can be really helpful for this. In this case we’ll explain that for each value contained in the array of column alias, we’ll have to create an xml element alias and add an attribute column-index with an incremental value. Note that the incremental value is automatically assigned to $i$ by StringTemplate and is starting at 1. Each element contained in the array of alias will be assigned to the variable x in the anonymous template. The anonymous template is surrounded by the pipe and the right curly brace.

$alias:{ x | <alias column-index="$i$">$x$</alias>}$

The template will also the StringTemplate functions trunc returning all the elements of an array except the last one and the function last returning the last element of an array.

The full listing for the template is available here under:

  <test name="$trunc(alias); separator=" and "$ to calculate $last(alias)$">
    <system-under-test>
      <resultSet>
        <query>
          <![CDATA[
          select
          {
         $measure; separator=", 
"$
          } on 0,
          {
          [Customer].[Country].members
          } on 1
          from
          [Adventure Works]
          ]]>
        </query>
      </resultSet>
    </system-under-test>
    <assert>
      <all-rows>
        $alias:{ x | <alias column-index="$i$">$x$</alias>}$
        <expression name="calculate">
           Round($expression$, $round$) = Round($last(alias)$, $round$)
        </expression>
        <predicate operand="calculate" type="boolean">
          <true />
        </predicate>
      </all-rows>
    </assert>
  </test>

And voilà, now you just need to run your genbiL script and you’ll get your two tests well generated. Feel free to add new cases!

Advertisements

SQL Saturday in Vienna and MS Cloud Summit in Paris to start in 2017

I’m sitting in my couch, watching a movie with my daughters, reminiscing about holidays, and getting ready for the first speaking engagements of 2017.

sql-saturday-579-viennaI’m completely thrilled to be invited to Vienna on January 20th to speak at SQL Saturday #579. I’ll be tackling a rather difficult subject … testing BI solutions. During this session, we’ll look to the features of the open-source framework named nbi. This framework is providing support for automated tests on the fields of databases, cubes, reports and ETLs, without the need of .Net skills. The demos will show us the best approaches to quickly and effectively assert the quality of BI developments. We’ll go a step further, generating the tests by an interesting system of templates and test-cases sources.

mscloudsummit-paris-2017I’ll also be speaking (in French) in Paris at the MS Cloud Summit Paris 2017. The topic will be Azure Stream Analytics. This is a completely new session and it’s taking a lot of patience, perseverance, and hard work to get ready (and to be honest it’s not 100% finished). Building the slide deck is the quick and easy part. But designing the demo scenarios, scripting and testing the demos, is the time-consuming part (Then will come the practicing part).

For both conferences, if Business Intelligence is not your cup of tea, there are plenty of other topics to choose from. With sessions geared toward Database Development, Database Administration/Security and even, in Paris, Office 365, Power Flow, Power Apps, Azure functions, … Both conferences offer a wide variety of content for everyone.

See you there?

Using MsBuild tasks to build SSIS packages

In a previous blog post, I explained how to recompile the MsBuild task for SSIS 2016. Now that we’ve this task, we’ll see how to use it.

First,let define a new file (here named Build.proj) in our SSIS solution

Then, we’ll need to define our headers and namespaces:

<Project
    xmlns="http://schemas.microsoft.com/developer/msbuild/2003"
    DefaultTargets="SSISBuild"
/>

The last attribute DefaultTargets is defining the part of the xml file that will be used when this file will be executed. In our case we’ll just have one version of the Target but it’s better to already correctly define it.

We also have to specify to MsBuild that an external assembly will be used (the community tasks to build SSIS packages).

<UsingTask
    TaskName="DeploymentFileCompilerTask"
    AssemblyFile="..\SSISMSBuild\SSISMSBuild\bin\Debug\SSISMSBuild.dll"
/>

It’s now time to create a kind of variable to host the name/path of our SSIS project. Within MsBuild you achieve this task by using the following syntax:

<ItemGroup>
    <SSISProjPath Include="..\anotherFolder\MySSISProject.dtproj" />
</ItemGroup>

SSISProjPath is the name of the variable and the value is defined in the Include attribute.

Now, let’s focus on the Target itself. The target is named SSISBuild and will display a message before using the task to compile the SSIS project. The second step is performed by invoking the task DeploymentFileCompilerTask

<Target Name="SSISBuild">
   <Message Text="*********** Building SSIS project ***********"/>
   <DeploymentFileCompilerTask
       InputProject="@(SSISProjPath)"
       Configuration="Development"
       ProtectionLevel="DontSaveSensitive"
   />
</Target>

The first parameter of DeploymentFileCompilerTask is the name of your project, you can safely use your variable by specifying the @() syntax. The second parameter is the configuration, personnaly I never change it and use Development but up-to-you. The protection level will override the corresponding property of your project.

If you want to compile, your project you can call MsBuild and specify Build.proj as the parameter, it will do the job! But it’s more interesting if you can include this in your build steps. I’ll just show you how you can include this in your TeamCity configuration:

  1. Create a build step
  2. Define the runner type as MsBuild
  3. Specify the MSBuild property for version as Microsoft Build Tools 2015 and ToolsVersion as 14.0
  4. Don’t forget to specify the name of your target
    as SSISBuild
ssis-build-teamcity

Once you’ve this setup in place, you can trigger your TeamCity project to get the following message and your build ready.

ssis-build-teamcity-output

Building the MsBuild Tasks for SSIS 2016

MsBuild Tasks are a set of two tasks available in the community project supported by Microsoft (project SSISMSBuild) and hosted on Codeplex. They aim to support the build of SSIS packages from MsBuild files. This a great way to automate the build of your packages without the need of Visual Studio (and so without licences issues). Unfortunately, this project has not been updated since 2010 but the source code is still available. It means that you need to update the dependencies and compile these tasks if you want to use it with SQL Server 2012, 2014 or 2016 (or vNext).

Download the solution and open it with Visual Studio:

ssis-build-projects

To “upgrade” for SSIS 2016, open the solution with Notepad++ or any other text editor. Change the hint paths (not just once but all of them) to redirect to your version of Visual Studio:

  • Visual Studio 11.0  = Visual Studio 2012
  • Visual Studio 12.0  = Visual Studio 2013
  • Visual Studio 14.0  = Visual Studio 2015
  • Visual Studio 15.0  = Visual Studio 2017
ssis-build-hint-path

Then, it’s time to update the references to SQL Server in the project. Check the path of the dll named Microsoft.SqlServer.ManagedDTS. Keep in mind the following matches:

  • v11 = SQL Server 2012
  • v12 = SQL Server 2014
  • v13 = SQL Server 2016
  • v14 = SQL Server vNext

.ssis-build-references

You also need to sign your build. Don’t panic it’s really easy. Open the properties of your project and create a new key:

ssis-build-project-properties

You must update the target framework to be inline with the framework of SSDT … currently .NET framework 4.5.

ssis-build-project-framework

Build and finally copy/paste your new dll to the directory C:\Program files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\PrivateAssemblies\ (If needed change the path to target the correct version of Visual Studio).

That’s it, you now have correct version of the MsBuild tasks for SSIS, in a next blog post, I’ll explain how to use it.

How to log the queries received by a cube?

In a previous post, I explained how to analyze which users connect to your cube, this blog post will focus on logging their queries. The strategy, here under, has a small footprint on your server by using traces. Use it to debug some weird situations or to monitor during a brief instant but if you want to use this strategy for long periods, be careful and monitor the impact on your server’s performances.

average dba and queries

The way to achieve this is the similar to the method for auditing users’ connections. You create a trace with Extended Events, you redirect the flow of events to a flat file and you read this file with the help of the function fn_xe_file_target_read.

Let’s start by creating the trace. The interesting event is Query End (code = QueryEnd). This event comes with its children: notably, the query itself (TextData), the start and end time (UTC format) and the NTUserName. For the last one, I’ve noticed that the information is not necessarily filled. Without confirmation, I’d say that this field is only available when the event AuditLogin is also intercepted by a trace.

<Create
   xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
   xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
   xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
   xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
   xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300">
   <ObjectDefinition>
     <Trace>
       <ID>SQLAdmin_Query</ID>
       <Name>SQLAdmin_Query</Name>
       <AutoRestart>true</AutoRestart>

       <ddl300_300:XEvent>
         <event_session name="xeas"            
            dispatchLatency="1"            
            maxEventSize="4"            
            maxMemory="4"            
            memoryPartitionMode="none"            
            eventRetentionMode="allowSingleEventLoss"            
            trackCausality="true"
         >
         <event package="AS" name="QueryEnd" />
         <target package="Package0" name="event_file">
           <parameter name="filename" value="L:\MSSQL_AS_Log\SQLAdmin_Query.xel" />
           <parameter name="max_file_size" value="1024" />
           <parameter name="max_rollover_files" value="3" />
         </target>
       </event_session>
     </ddl300_300:XEvent>
     </Trace>
   </ObjectDefinition>
 </Create>

If you’re looking for queries executing in more than 60 seconds, you should define the following query:

with cte as
(
select
	event_data.value(
		'(event/data[@name="StartTime"])[1]'
		, 'varchar(255)'
	) as StartTime
	, event_data.value(
		'(event/data[@name="EndTime"])[1]'
		, 'varchar(255)'
	) as EndTime
	, event_data.value(
		'(event/data[@name="TextData"])[1]'
		, 'varchar(4000)'
	) as TextData
	, event_data.value(
		'(event/data[@name="NTUserName"])[1]'
		, 'varchar(255)'
	) as NTUserName
	, event_data
from
(
select
   cast(event_data as xml) as event_data
from
   sys.fn_xe_file_target_read_file(
      'L:\MSOLAP\SQLAdmin_Query_*.xel'
       , null, null, null
   )
where
   object_name='QueryEnd'
) t
)
select
	StartTime
	, EndTime
	, datediff(SECOND, StartTime, EndTime) as Duration
	, TextData
	, NTUserName
	, event_data

from
	cte
where
	datediff(SECOND, StartTime, EndTime)>60

How to find out who is connected to my cube

A few days ago, to prepare a potentially difficult meeting with some internal clients, I needed more insights about the users of one the cube … In other words: who is connected and do they connect daily or weekly (or less frequently).

students group raise hands up in classroom

You need another solution to know who is connected to your cube!

First issue: some people are consuming data from this cube through reports or applications. These reports and applications use impersonation of a generic account to query the cube. Hopefully, these kind of connections were out-of-scope for my study. I just wanted to intercept the connections from Excel or SSMS and they are not impersonated.

You’ve many solutions to intercept the different connections to a cubes, but one of the most elegant (and not well-known) ways is with the Extended Events of SSAS. You’ll need to request Analysis Service to intercept one or more events and redirect them to a target.

The definition of the trace is not so complex and you can easily code it with an XMLA command if you’ve a template. The first point is to give an identification to the trace and specify that this trace must be restarted in case of restart of the SSAS service (attribute AutoRestart set to true).

<Create>
   <ObjectDefinition>
      <Trace>
         <ID>SQLAdmin_Audit</ID>
         <Name>SQLAdmin_Audit</Name>
         <AutoRestart>true</AutoRestart>
         
      </Trace>
   </ObjectDefinition>
</Create>

In order to limit the effect of this trace on server performances and to be able to read the content a few days later, I decided to persist the trace on a file. I configured the trace’s destination on a different disk that my cube (attribute filename). To avoid any disk space issue, I configured a roll-over files strategy and limit them to 3 occurrences (parameter named max_rollover_files) of 1Gb (parameter max_file_size expressed in Mb) in the xml element target.

<ddl300_300:XEvent>
   <event_session>
      <target package="Package0" name="event_file">
         <parameter name="filename" value="L:\MSOLAP\SQLAdmin_Audit.xel" />
         <parameter name="max_file_size" value="1024" />
         <parameter name="max_rollover_files" value="3" />
      </target>
   </event_session>
</ddl300_300:XEvent>

The you’ll need to define which events are interesting. In this case, I decided to intercept the event Audit Login ([/code]code = AuditLogin[/code]). This event is raised for all new connection, such as when a client requests a connection to the server. This server is dedicated to the cube that I needed to monitor so no additional filter will be needed.

 <ddl300_300:XEvent>
   <event_session ...>
      <event package="AS" name="AuditLogin" />
   </event_session>
 </ddl300_300:XEvent>

The full creation script is available here under:

<Create
   xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
   xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
   xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
   xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
   xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300">
   <ObjectDefinition>
      <Trace>
         <ID>SQLAdmin_Audit</ID>
         <Name>SQLAdmin_Audit</Name>
         <AutoRestart>true</AutoRestart>
         <ddl300_300:XEvent>
            <event_session name="xeas"
              dispatchLatency="1"
              maxEventSize="4"
              maxMemory="4"
              memoryPartitionMode="none"
              eventRetentionMode="allowSingleEventLoss"
              trackCausality="true">
               <event package="AS" name="AuditLogin" />
               <target package="Package0" name="event_file">
                  <parameter name="filename" value="L:\MSOLAP\SQLAdmin_Audit.xel" />
                  <parameter name="max_file_size" value="1024" />
                  <parameter name="max_rollover_files" value="3" />
               </target>
            </event_session>
         </ddl300_300:XEvent>
      </Trace>
   </ObjectDefinition>
</Create>

To be sure that your trace is running you can use the following script

<Discover xmlns="urn:schemas-microsoft-com:xml-analysis">
   <RequestType>DISCOVER_TRACES</RequestType>
   <Restrictions/>
   <Properties/>
</Discover>

You can also check on your file location (L:\MSOLAP), if a file starting SQLAdmin_Audit and ending by .xel is available. Don’t panic with the size of your file or tha last update timestamp, some events could be delayed and the timestamp is not always updated.

And how can I query this file? With SQL Server database engine! The key function to read this kind of file is named sys.fn_xe_file_target_read. This function takes care of rollover files for you and that’s really cool. First parameter is the filename and others re not needed. If you’ve more events than just the AuditLogin, you can easily filter on the field object_name

select 
   cast(event_data as xml) as event_data
from
   sys.fn_xe_file_target_read_file(
      'L:\MSOLAP\SQLAdmin_Audit_*.xel'
      , null, null, null
   )
where
   object_name='AuditLogin'
)

Oups, this function returns some xml elements. SQL Server has features to query xml and extract specific fragments. For this case, the usage of value will be a good approach. This method requires and XML path followed by and an additional parameter (just let the [1]). The second parameter expected by this function is just the SQL return type (between simple quotes). With the help of this function, I’m able to extract the NTUserName and the StartTime of the connection, just after I’m filtering on distinct users to have my list.

with cte as
(
select
   event_data.value(
      '(event/data[@name="NTUserName"])[1]'
      , 'varchar(255)'
   ) as    NTUserName
   , event_data.value(
      '(event/data[@name="StartTime"])[1]'
      , 'varchar(255)'
   ) as StartTime
   , event_data
from
(
   select 
      cast(event_data as xml) as event_data
   from
      sys.fn_xe_file_target_read_file(
         'L:\MSOLAP\SQLAdmin_Audit_*.xel'
         , null, null, null
      )
   where
      object_name='AuditLogin'
)
) t
select distinct
   NTUserName
from
   cte

With this first step I’m able to identify the users and if they were frequent users or not. In a next blog post, I will explain how to intercept the queries. But before, it’s perhaps useful to remove the trace from your server if it’s not useful at the moment:

<Delete
   xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
   xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
   xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
   xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
   xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300">
   <Object>
      <TraceID>SQLAdmin_Audit</TraceID>
   </Object>
</Delete>

SSAS measures’ format and Excel

For one of the cube I’m working on, we needed to format a measure by adding the suffix k€. The front-end is Excel 2010/2013.

My colleague was in charge of this part of the development and had defined a FORMAT_STRING for the measure, as we usually do: FORMAT_STRING ="#,##0 k€". I was in charge of the testing, with NBi, and I had defined a test for the format of a cell and this test was successful. Sun shinning.

A few days later, I received a screenshot of an Excel sheet and my eyes were caught by a poor formatting. In place of an expected 1.075 k€ I had an awful 1075,2357777. My first move was to connect with SSMS and I ran the equivalent MDX query and surprise … the format was correct 1.075 k€. Back to an Excel sheet, connected to this cube, and indeed the format was wrong 1075,2357777. Confirmation from the cube definition, the format was defined and confirmation from NBi the format was correctly applied. WTF? Why Excel wasn’t able to display the value correctly formatted when other tools were able?

Since the development of NBi, I knew that SSAS is returning a VALUE and a FORMATTED_VALUE. NBi is able to retrieve this FORMATTED_VALUE to perform a test of a cell’s format. The documentation of SSAS is describing the relation between FORMATTED_VALUE and FORMAT_STRING:

FORMAT_STRING: The formatting template to be applied to the value of the cell to generate FORMATTED_VALUE property

I decided to use the SSAS profiler to retrieve the query emitted by Excel. The query wasn’t really complex but the CELL PROPERTIES were a bit embarrassing: no trace of FORMATTED_VALUE just FORMAT_STRING. Based on this observation, a search on Google confirmed that Excel is retrieving the FORMAT_STRING and is applying the format by itself (in place of relying on the FORMATTED_VALUE returned by SSAS).

Apparently Excel and SSAS don’t have the same interpretation of the FORMAT_STRING.  A miracle occurred and I took the decision to place a backslash before the k: #.##0 \k€. The format was still correct in SSAS and suddenly a k was also visible in Excel … a few seconds later , I added another backslash in front of the euro symbol and the value was correctly formatted in Excel and SSAS.

Based on  SSAS documentation, the backslash is not needed and indeed SSAS is correcty managing the FORMAT_STRING without the backslash. But apparently Excel is less tolerant and the backslash is mandatory in this kind of formatting.

 

Detect Tabular or Multidimensional mode with AdomdClient

In this excellent blog post, Marco Russo explains how to programmatically detect if an instance of SSAS is in Tabular or Multidimensional mode.

This post is a great resource but didn’t fully resolve my own issue. The project I’m working on already had a reference to Microsoft.AnalysisServices.AdomdClient.dll and if I could avoid a second dll of this kind it would be a nice achievement. Seriously, if you can avoid to package and deploy dlls related to SQL Server … do it!

The second enlisted option in Marco’s post is based on Analysis Management Objects and means that you need to reference Microsoft.AnalysisServices.dll. As previously explained: no way!

So the only remaining option viable for my case is to use the XMLA approach (with the Microsoft.AnalysisServices.AdomdClient.dll). The AdomdCommand class has a nice method ExecuteXmlReader. This method returns an XML reader, it sounded like a nice point to start. But it’s definitely not a valid option, the AdomdCommand doesn’t support discovery request. If you try to provide the xml discovery request in the property CommandText then, at the execution, you’ll receive this kind of message.

Executing the query ...

The Discover element at line 7, column 72 (namespace urn:schemas-microsoft-com:xml-analysis) cannot appear under Envelope/Body/Execute/Command.

Execution complete

How to perform a discovery request on a cube with just the AdomdClient library? Simply by using a nice method of the AdmodConnection (Yes connection, not command) class named GetSchemaDataSet. This method expects the name of the AdomdSchema that you want to query: In our case DISCOVER_XML_METADATA. You also need to provide some restrictions to avoid a gigantic response from SSAS. In this case the restriction is “ObjectExpansion” with value “ReferenceOnly”.

using (var conn = new AdomdConnection(connectionString))
{
   conn.Open();
   var restrictions = new AdomdRestrictionCollection();
   restrictions.Add(new AdomdRestriction("ObjectExpansion", "ReferenceOnly"));
   var ds = conn.GetSchemaDataSet("DISCOVER_XML_METADATA", restrictions);
}

Next task is to decrypt the response sent by SSAS. This response is stored in a dataset with one table and one row. The unique field contains the whole xml string. The best option is to load this string into an XmlDocument before parsing it.

var xml = ds.Tables[0].Rows[0].ItemArray[0].ToString();
var doc = new XmlDocument();
doc.LoadXml(xml);

You need to query this xml document to extract the ServerMode node’s text. Don’t forget to correctly configure the namespaces before querying the document!

protected string ParseXmlaResponse(XmlDocument doc)
{
   var root = doc.DocumentElement;
   var nm = new XmlNamespaceManager(doc.NameTable);
   nm.AddNamespace("ddl300", "http://schemas.microsoft.com/analysisservices/2011/engine/300");
   var node = root.SelectSingleNode("//ddl300:ServerMode", nm);
   if (node == null)
   throw new ArgumentException("Unable to locate the node for ServerMode.");
   return node.InnerText;
}

Note that you’ll need, at least, the version 11.0 of the AdomdClient library, previous versions will not return the node ServerMode.

Side note: This discovery request also returns the version of the SSAS instance. Even with the AdomdClient library 10.0 (or previous), you can parse this field “version” and if its content is less than 11.0, you can take the conclusion than this is surely a multidimensional instance. 😉

Back from SQL Saturday #323 at Paris

Last week-end, I was at the SQL Saturday #323 in Paris to present a session about tests automation for BI solutions with the framework NBi.

The venue (Montparnasse Tower) was really a great place to host a SQL Saturday event and the organizers did a really good job. I got to attend sessions on topics I usually don’t work on (DQS, HDInsight) and learned new things that way. It was a long day, but a great day! What I enjoyed the most was meeting everyone in person and getting to know new people. Special thanks to Florian Eiden (b|t), the man who spotted me and my project traipsing on the web but also Charles-Henri Sauget (b|t), Romain Castères (b|t), Patrice Harel (b) and other members of the GUSS’s Board. Also wanted to thanks Oliver Engels (t) and Tillmann Eitelberg (t) for the interesting discussions and encouragement about the project NBi.

The source code of the different demos performed during the session is available here. The slide deck (in french) has been uploaded on SlideShare. I’ll translate it in English soon.

I also get an interesting question from Romuald Coutaud during the session about the way the query was effectively executed on SQL Server depending on its origin (NBi or SSRS). To be honest, I hadn’t fully understood the question at the first glance, it’s probably why I’ve given a vague answer. I’ve checked my final assumption during my journey back to Belgium and can confirm that there is no difference between the query emitted by NBi and the query emitted by SSRS. NBi and SSRS are both using the .Net Sql Client provider and this provider is executing a sp_executesql stored procedure with the statement and parameters provided as arguments. Based on this, both query plans should be identical.

How to compare the result-sets returned by an SQL query and an MDX query with NBi

If you’re designing an SSAS cube and you’re putting into action some test automation, you’ll probably want to compare the results of two queries. The first one on a source system or on your sql data warehouse and the second one on your cube.

It’s a common and simple task for NBi. This framework is designed to ensure that this kind of test doesn’t become quickly too complex. The implementation of this test shouldn’t take more than a few seconds. As for other tests designed for this framework, you’ll need to define two objects: a system-under-test (the result-set of the query on the cube) and an assertion (it’s equal to the result-set of the query on the data warehouse). The basic structure of the test will be:

<test name="Compare Cube (MDX) with datawarehouse (SQL)">
<system-under-test>
         <execution>
             <query connectionString="@AsAdventureWorks2012">
                 <!-- Here my MDX Query -->
             </query>
         </execution>
    </system-under-test>
    <assert>
         <equalTo>
             <query connectionString="@SqlAdventureWorks2012">
                 <!-- Here my SQL Query -->
             </query>
         </equalTo>
     </assert>
</test>

The two queries can be directly embedded into the test file or written in an external file. Both options have their own advantages. If you’re placing your query into the test it will be helpful for test edition to have only one file to open. On the opposite, a query written in an external file, could be open directly in SQL Server Management Studio, this file could also be referenced more than once by your test-suite. The choice is not obvious and is depending of what you’ll exactly do with this query, but a general guidance could be to use an embedded query if this query is not too large and is only used once in your test-suite.

To illustrate our sample, we’ll check that the “count of customers” is indeed correctly implemented in the SSAS cube of Adventure Works. For this we’ll use the “Date” dimension as a dicer. The corresponding MDX query will be:

select
    [Measures].[Internet Order Count] on 0,
    non empty([Date].[Calendar Year].children) on 1
from
    [Adventure Works]

MdxResult
Execution of this MDX query will return the preceding result-set: Four rows of two cells. The first cell of a row is the year and the second the count of customers.

In the corresponding SQL data warehouse, we’ll need to write the following query.

select
	'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
	, count([CustomerID]) as CountOfCustomer
from
	[Sales].[SalesOrderHeader]
where
	[OnlineOrderFlag]=1
group by
	'CY ' + cast(year([OrderDate]) as varchar(50))

SqlResult
The corresponding result-set is displayed at the left. Note that for this result-set, there is no guarantee of the ordering of the rows. In this case, 2006 is at the top and 2007 at the end. This is not a problem for NBi, the framework doesn’t take into account the ordering of the rows to compare two result-sets.

If we put together the two queries and the test definition, we’ve the following test:

<test name="">
    <system-under-test>
         <execution>
             <query connectionString="@AsAdventureWorks2012">
                 select
					[Measures].[Internet Order Count] on 0,
					non empty([Date].[Calendar Year].children) on 1
				 from
					[Adventure Works]
             </query>
         </execution>
    </system-under-test>
     <assert>
         <equalTo>
             <query connectionString="@SqlAdventureWorks2012">
                 select
					'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
					, count([CustomerID]) as CountOfCustomer
				 from 
				 	[Sales].[SalesOrderHeader]
				 where
				 	[OnlineOrderFlag]=1
				 group by 
					'CY ' + cast(year([OrderDate]) as varchar(50))
             </query>
         </equalTo>
     </assert>
</test>

Positive ResultThe execution of this test in the test runner will return a positive result (green). As already said before, the fact that the rows are not ordered the same way on the two result-sets is not an issue for NBi. The framework doesn’t rely on rows ordering when executing the comparison but on keys matching. By default, NBi will consider the first cells of the row as part of the key. On our sample, NBi will look for the row “CY 2005” (first row of the MDX query) into the second result-set. When this row is found in the second-result-set, NBi will compare its value (content of last cell of the row / 1013) to the expected value (1013). NBi will continue like this for each row. In this case NBi will take the conclusion that the result-sets are equal and validate the test.

If we slightly change our SQL to a “distinct count” in place of a “count”,

select
	'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
	, count(distinct [CustomerID]) as CountOfCustomer
from
	[Sales].[SalesOrderHeader]
where
	[OnlineOrderFlag]=1
group by
	'CY ' + cast(year([OrderDate]) as varchar(50))

Reload-Testsit will impact the result-set and the test should not be green any more. To ensure, this you can run the test after adaptation of your query in your nbits file. Don’t forget to reload the test (CTRL+R) in NUnit to ensure your latest version of your test-suite is taken into account.
Negative Result
Then execute the test and you’ll see a red bar, synonym of failure.

delta-resultsetsThe exception described just under is “Execution of the query doesn’t match the expected result”. If you copy paste the full message displayed in the textbox, you’ll receive additional valuable information about the difference between the two result-sets. At the top of the message NBi will display the first 10 (max.) rows of the two result-sets (sections highlighted in yellow). Just under, NBi will display the rows missing, unexpected or with delta in the values (highlighted in green). In the sample, two rows are considered as non-matching because their values in the two result-sets diverge.

Now, if we go back to our initial SQL query (so without the distinct) but we introduce arbitrarily an additional where clause, we can test the behavior of NBi when the two result-sets don’t match in term of rows’ count.

select
	'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
	, count([CustomerID]) as CountOfCustomer
from
	[Sales].[SalesOrderHeader]
where
	[OnlineOrderFlag]=1
        and year([OrderDate])<2008
group by
	'CY ' + cast(year([OrderDate]) as varchar(50))

If we just add this line to our query in the nbits file, we’ll have troubles when trying to load the test in NUnit. Indeed, the symbol “<" is ambiguous in an xml file and so we should mark it as a non-xml-symbol. This is quickly done by introducing a CDATA element in the xml. The assertion should look as

<assert>
    <equalTo>
        <query connectionString="@SqlAdventureWorks2012">
            <![CDATA[
            select
                'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
                , count(distinct [CustomerID]) as CountOfCustomer
            from 
                [Sales].[SalesOrderHeader]
            where
                [OnlineOrderFlag]=1
                and year([OrderDate])<2008
            group by 
                'CY ' + cast(year([OrderDate]) as varchar(50))
            ]]>
        </query>
    </equalTo>
</assert>

unexpected-rowsAs expected, if the adapted test is run, NBi will display a red light and will help you to find the difference between the result-sets by identifying the unexpected row in the system-under-test (the section is highlighted in red).

In the next posts we’ll see how to tune NBi to detect missing rows and unexpected rows in more complex cases but also how to introduce a tolerance when comparing values of rows.