SSAS

How to log the queries received by a cube?

In a previous post, I explained how to analyze which users connect to your cube, this blog post will focus on logging their queries. The strategy, here under, has a small footprint on your server by using traces. Use it to debug some weird situations or to monitor during a brief instant but if you want to use this strategy for long periods, be careful and monitor the impact on your server’s performances.

average dba and queries

The way to achieve this is the similar to the method for auditing users’ connections. You create a trace with Extended Events, you redirect the flow of events to a flat file and you read this file with the help of the function fn_xe_file_target_read.

Let’s start by creating the trace. The interesting event is Query End (code = QueryEnd). This event comes with its children: notably, the query itself (TextData), the start and end time (UTC format) and the NTUserName. For the last one, I’ve noticed that the information is not necessarily filled. Without confirmation, I’d say that this field is only available when the event AuditLogin is also intercepted by a trace.

<Create
   xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
   xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
   xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
   xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
   xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300">
   <ObjectDefinition>
     <Trace>
       <ID>SQLAdmin_Query</ID>
       <Name>SQLAdmin_Query</Name>
       <AutoRestart>true</AutoRestart>

       <ddl300_300:XEvent>
         <event_session name="xeas"            
            dispatchLatency="1"            
            maxEventSize="4"            
            maxMemory="4"            
            memoryPartitionMode="none"            
            eventRetentionMode="allowSingleEventLoss"            
            trackCausality="true"
         >
         <event package="AS" name="QueryEnd" />
         <target package="Package0" name="event_file">
           <parameter name="filename" value="L:\MSSQL_AS_Log\SQLAdmin_Query.xel" />
           <parameter name="max_file_size" value="1024" />
           <parameter name="max_rollover_files" value="3" />
         </target>
       </event_session>
     </ddl300_300:XEvent>
     </Trace>
   </ObjectDefinition>
 </Create>

If you’re looking for queries executing in more than 60 seconds, you should define the following query:

with cte as
(
select
	event_data.value(
		'(event/data[@name="StartTime"])[1]'
		, 'varchar(255)'
	) as StartTime
	, event_data.value(
		'(event/data[@name="EndTime"])[1]'
		, 'varchar(255)'
	) as EndTime
	, event_data.value(
		'(event/data[@name="TextData"])[1]'
		, 'varchar(4000)'
	) as TextData
	, event_data.value(
		'(event/data[@name="NTUserName"])[1]'
		, 'varchar(255)'
	) as NTUserName
	, event_data
from
(
select
   cast(event_data as xml) as event_data
from
   sys.fn_xe_file_target_read_file(
      'L:\MSOLAP\SQLAdmin_Query_*.xel'
       , null, null, null
   )
where
   object_name='QueryEnd'
) t
)
select
	StartTime
	, EndTime
	, datediff(SECOND, StartTime, EndTime) as Duration
	, TextData
	, NTUserName
	, event_data

from
	cte
where
	datediff(SECOND, StartTime, EndTime)>60

How to find out who is connected to my cube

A few days ago, to prepare a potentially difficult meeting with some internal clients, I needed more insights about the users of one the cube … In other words: who is connected and do they connect daily or weekly (or less frequently).

students group raise hands up in classroom

You need another solution to know who is connected to your cube!

First issue: some people are consuming data from this cube through reports or applications. These reports and applications use impersonation of a generic account to query the cube. Hopefully, these kind of connections were out-of-scope for my study. I just wanted to intercept the connections from Excel or SSMS and they are not impersonated.

You’ve many solutions to intercept the different connections to a cubes, but one of the most elegant (and not well-known) ways is with the Extended Events of SSAS. You’ll need to request Analysis Service to intercept one or more events and redirect them to a target.

The definition of the trace is not so complex and you can easily code it with an XMLA command if you’ve a template. The first point is to give an identification to the trace and specify that this trace must be restarted in case of restart of the SSAS service (attribute AutoRestart set to true).

<Create>
   <ObjectDefinition>
      <Trace>
         <ID>SQLAdmin_Audit</ID>
         <Name>SQLAdmin_Audit</Name>
         <AutoRestart>true</AutoRestart>
         
      </Trace>
   </ObjectDefinition>
</Create>

In order to limit the effect of this trace on server performances and to be able to read the content a few days later, I decided to persist the trace on a file. I configured the trace’s destination on a different disk that my cube (attribute filename). To avoid any disk space issue, I configured a roll-over files strategy and limit them to 3 occurrences (parameter named max_rollover_files) of 1Gb (parameter max_file_size expressed in Mb) in the xml element target.

<ddl300_300:XEvent>
   <event_session>
      <target package="Package0" name="event_file">
         <parameter name="filename" value="L:\MSOLAP\SQLAdmin_Audit.xel" />
         <parameter name="max_file_size" value="1024" />
         <parameter name="max_rollover_files" value="3" />
      </target>
   </event_session>
</ddl300_300:XEvent>

The you’ll need to define which events are interesting. In this case, I decided to intercept the event Audit Login ([/code]code = AuditLogin[/code]). This event is raised for all new connection, such as when a client requests a connection to the server. This server is dedicated to the cube that I needed to monitor so no additional filter will be needed.

 <ddl300_300:XEvent>
   <event_session ...>
      <event package="AS" name="AuditLogin" />
   </event_session>
 </ddl300_300:XEvent>

The full creation script is available here under:

<Create
   xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
   xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
   xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
   xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
   xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300">
   <ObjectDefinition>
      <Trace>
         <ID>SQLAdmin_Audit</ID>
         <Name>SQLAdmin_Audit</Name>
         <AutoRestart>true</AutoRestart>
         <ddl300_300:XEvent>
            <event_session name="xeas"
              dispatchLatency="1"
              maxEventSize="4"
              maxMemory="4"
              memoryPartitionMode="none"
              eventRetentionMode="allowSingleEventLoss"
              trackCausality="true">
               <event package="AS" name="AuditLogin" />
               <target package="Package0" name="event_file">
                  <parameter name="filename" value="L:\MSOLAP\SQLAdmin_Audit.xel" />
                  <parameter name="max_file_size" value="1024" />
                  <parameter name="max_rollover_files" value="3" />
               </target>
            </event_session>
         </ddl300_300:XEvent>
      </Trace>
   </ObjectDefinition>
</Create>

To be sure that your trace is running you can use the following script

<Discover xmlns="urn:schemas-microsoft-com:xml-analysis">
   <RequestType>DISCOVER_TRACES</RequestType>
   <Restrictions/>
   <Properties/>
</Discover>

You can also check on your file location (L:\MSOLAP), if a file starting SQLAdmin_Audit and ending by .xel is available. Don’t panic with the size of your file or tha last update timestamp, some events could be delayed and the timestamp is not always updated.

And how can I query this file? With SQL Server database engine! The key function to read this kind of file is named sys.fn_xe_file_target_read. This function takes care of rollover files for you and that’s really cool. First parameter is the filename and others re not needed. If you’ve more events than just the AuditLogin, you can easily filter on the field object_name

select 
   cast(event_data as xml) as event_data
from
   sys.fn_xe_file_target_read_file(
      'L:\MSOLAP\SQLAdmin_Audit_*.xel'
      , null, null, null
   )
where
   object_name='AuditLogin'
)

Oups, this function returns some xml elements. SQL Server has features to query xml and extract specific fragments. For this case, the usage of value will be a good approach. This method requires and XML path followed by and an additional parameter (just let the [1]). The second parameter expected by this function is just the SQL return type (between simple quotes). With the help of this function, I’m able to extract the NTUserName and the StartTime of the connection, just after I’m filtering on distinct users to have my list.

with cte as
(
select
   event_data.value(
      '(event/data[@name="NTUserName"])[1]'
      , 'varchar(255)'
   ) as    NTUserName
   , event_data.value(
      '(event/data[@name="StartTime"])[1]'
      , 'varchar(255)'
   ) as StartTime
   , event_data
from
(
   select 
      cast(event_data as xml) as event_data
   from
      sys.fn_xe_file_target_read_file(
         'L:\MSOLAP\SQLAdmin_Audit_*.xel'
         , null, null, null
      )
   where
      object_name='AuditLogin'
)
) t
select distinct
   NTUserName
from
   cte

With this first step I’m able to identify the users and if they were frequent users or not. In a next blog post, I will explain how to intercept the queries. But before, it’s perhaps useful to remove the trace from your server if it’s not useful at the moment:

<Delete
   xmlns="http://schemas.microsoft.com/analysisservices/2003/engine"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xmlns:ddl2="http://schemas.microsoft.com/analysisservices/2003/engine/2"
   xmlns:ddl2_2="http://schemas.microsoft.com/analysisservices/2003/engine/2/2"
   xmlns:ddl100_100="http://schemas.microsoft.com/analysisservices/2008/engine/100/100"
   xmlns:ddl200_200="http://schemas.microsoft.com/analysisservices/2010/engine/200/200"
   xmlns:ddl300_300="http://schemas.microsoft.com/analysisservices/2011/engine/300/300">
   <Object>
      <TraceID>SQLAdmin_Audit</TraceID>
   </Object>
</Delete>

SSAS measures’ format and Excel

For one of the cube I’m working on, we needed to format a measure by adding the suffix k€. The front-end is Excel 2010/2013.

My colleague was in charge of this part of the development and had defined a FORMAT_STRING for the measure, as we usually do: FORMAT_STRING ="#,##0 k€". I was in charge of the testing, with NBi, and I had defined a test for the format of a cell and this test was successful. Sun shinning.

A few days later, I received a screenshot of an Excel sheet and my eyes were caught by a poor formatting. In place of an expected 1.075 k€ I had an awful 1075,2357777. My first move was to connect with SSMS and I ran the equivalent MDX query and surprise … the format was correct 1.075 k€. Back to an Excel sheet, connected to this cube, and indeed the format was wrong 1075,2357777. Confirmation from the cube definition, the format was defined and confirmation from NBi the format was correctly applied. WTF? Why Excel wasn’t able to display the value correctly formatted when other tools were able?

Since the development of NBi, I knew that SSAS is returning a VALUE and a FORMATTED_VALUE. NBi is able to retrieve this FORMATTED_VALUE to perform a test of a cell’s format. The documentation of SSAS is describing the relation between FORMATTED_VALUE and FORMAT_STRING:

FORMAT_STRING: The formatting template to be applied to the value of the cell to generate FORMATTED_VALUE property

I decided to use the SSAS profiler to retrieve the query emitted by Excel. The query wasn’t really complex but the CELL PROPERTIES were a bit embarrassing: no trace of FORMATTED_VALUE just FORMAT_STRING. Based on this observation, a search on Google confirmed that Excel is retrieving the FORMAT_STRING and is applying the format by itself (in place of relying on the FORMATTED_VALUE returned by SSAS).

Apparently Excel and SSAS don’t have the same interpretation of the FORMAT_STRING.  A miracle occurred and I took the decision to place a backslash before the k: #.##0 \k€. The format was still correct in SSAS and suddenly a k was also visible in Excel … a few seconds later , I added another backslash in front of the euro symbol and the value was correctly formatted in Excel and SSAS.

Based on  SSAS documentation, the backslash is not needed and indeed SSAS is correcty managing the FORMAT_STRING without the backslash. But apparently Excel is less tolerant and the backslash is mandatory in this kind of formatting.

 

How to compare the result-sets returned by an SQL query and an MDX query with NBi

If you’re designing an SSAS cube and you’re putting into action some test automation, you’ll probably want to compare the results of two queries. The first one on a source system or on your sql data warehouse and the second one on your cube.

It’s a common and simple task for NBi. This framework is designed to ensure that this kind of test doesn’t become quickly too complex. The implementation of this test shouldn’t take more than a few seconds. As for other tests designed for this framework, you’ll need to define two objects: a system-under-test (the result-set of the query on the cube) and an assertion (it’s equal to the result-set of the query on the data warehouse). The basic structure of the test will be:

<test name="Compare Cube (MDX) with datawarehouse (SQL)">
<system-under-test>
         <execution>
             <query connectionString="@AsAdventureWorks2012">
                 <!-- Here my MDX Query -->
             </query>
         </execution>
    </system-under-test>
    <assert>
         <equalTo>
             <query connectionString="@SqlAdventureWorks2012">
                 <!-- Here my SQL Query -->
             </query>
         </equalTo>
     </assert>
</test>

The two queries can be directly embedded into the test file or written in an external file. Both options have their own advantages. If you’re placing your query into the test it will be helpful for test edition to have only one file to open. On the opposite, a query written in an external file, could be open directly in SQL Server Management Studio, this file could also be referenced more than once by your test-suite. The choice is not obvious and is depending of what you’ll exactly do with this query, but a general guidance could be to use an embedded query if this query is not too large and is only used once in your test-suite.

To illustrate our sample, we’ll check that the “count of customers” is indeed correctly implemented in the SSAS cube of Adventure Works. For this we’ll use the “Date” dimension as a dicer. The corresponding MDX query will be:

select
    [Measures].[Internet Order Count] on 0,
    non empty([Date].[Calendar Year].children) on 1
from
    [Adventure Works]

MdxResult
Execution of this MDX query will return the preceding result-set: Four rows of two cells. The first cell of a row is the year and the second the count of customers.

In the corresponding SQL data warehouse, we’ll need to write the following query.

select
	'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
	, count([CustomerID]) as CountOfCustomer
from
	[Sales].[SalesOrderHeader]
where
	[OnlineOrderFlag]=1
group by
	'CY ' + cast(year([OrderDate]) as varchar(50))

SqlResult
The corresponding result-set is displayed at the left. Note that for this result-set, there is no guarantee of the ordering of the rows. In this case, 2006 is at the top and 2007 at the end. This is not a problem for NBi, the framework doesn’t take into account the ordering of the rows to compare two result-sets.

If we put together the two queries and the test definition, we’ve the following test:

<test name="">
    <system-under-test>
         <execution>
             <query connectionString="@AsAdventureWorks2012">
                 select
					[Measures].[Internet Order Count] on 0,
					non empty([Date].[Calendar Year].children) on 1
				 from
					[Adventure Works]
             </query>
         </execution>
    </system-under-test>
     <assert>
         <equalTo>
             <query connectionString="@SqlAdventureWorks2012">
                 select
					'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
					, count([CustomerID]) as CountOfCustomer
				 from 
				 	[Sales].[SalesOrderHeader]
				 where
				 	[OnlineOrderFlag]=1
				 group by 
					'CY ' + cast(year([OrderDate]) as varchar(50))
             </query>
         </equalTo>
     </assert>
</test>

Positive ResultThe execution of this test in the test runner will return a positive result (green). As already said before, the fact that the rows are not ordered the same way on the two result-sets is not an issue for NBi. The framework doesn’t rely on rows ordering when executing the comparison but on keys matching. By default, NBi will consider the first cells of the row as part of the key. On our sample, NBi will look for the row “CY 2005” (first row of the MDX query) into the second result-set. When this row is found in the second-result-set, NBi will compare its value (content of last cell of the row / 1013) to the expected value (1013). NBi will continue like this for each row. In this case NBi will take the conclusion that the result-sets are equal and validate the test.

If we slightly change our SQL to a “distinct count” in place of a “count”,

select
	'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
	, count(distinct [CustomerID]) as CountOfCustomer
from
	[Sales].[SalesOrderHeader]
where
	[OnlineOrderFlag]=1
group by
	'CY ' + cast(year([OrderDate]) as varchar(50))

Reload-Testsit will impact the result-set and the test should not be green any more. To ensure, this you can run the test after adaptation of your query in your nbits file. Don’t forget to reload the test (CTRL+R) in NUnit to ensure your latest version of your test-suite is taken into account.
Negative Result
Then execute the test and you’ll see a red bar, synonym of failure.

delta-resultsetsThe exception described just under is “Execution of the query doesn’t match the expected result”. If you copy paste the full message displayed in the textbox, you’ll receive additional valuable information about the difference between the two result-sets. At the top of the message NBi will display the first 10 (max.) rows of the two result-sets (sections highlighted in yellow). Just under, NBi will display the rows missing, unexpected or with delta in the values (highlighted in green). In the sample, two rows are considered as non-matching because their values in the two result-sets diverge.

Now, if we go back to our initial SQL query (so without the distinct) but we introduce arbitrarily an additional where clause, we can test the behavior of NBi when the two result-sets don’t match in term of rows’ count.

select
	'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
	, count([CustomerID]) as CountOfCustomer
from
	[Sales].[SalesOrderHeader]
where
	[OnlineOrderFlag]=1
        and year([OrderDate])<2008
group by
	'CY ' + cast(year([OrderDate]) as varchar(50))

If we just add this line to our query in the nbits file, we’ll have troubles when trying to load the test in NUnit. Indeed, the symbol “<" is ambiguous in an xml file and so we should mark it as a non-xml-symbol. This is quickly done by introducing a CDATA element in the xml. The assertion should look as

<assert>
    <equalTo>
        <query connectionString="@SqlAdventureWorks2012">
            <![CDATA[
            select
                'CY ' + cast(year([OrderDate]) as varchar(50)) as CivilYear
                , count(distinct [CustomerID]) as CountOfCustomer
            from 
                [Sales].[SalesOrderHeader]
            where
                [OnlineOrderFlag]=1
                and year([OrderDate])<2008
            group by 
                'CY ' + cast(year([OrderDate]) as varchar(50))
            ]]>
        </query>
    </equalTo>
</assert>

unexpected-rowsAs expected, if the adapted test is run, NBi will display a red light and will help you to find the difference between the result-sets by identifying the unexpected row in the system-under-test (the section is highlighted in red).

In the next posts we’ll see how to tune NBi to detect missing rows and unexpected rows in more complex cases but also how to introduce a tolerance when comparing values of rows.

Test automation for a SSAS cube

I’ve never really understood why data-oriented people are so slow and unwilling to put into action the best practices from traditional IT. I mean, in 2014, it’s not rare to find a database’s code not protected by a source control or to find an etl-developer fixing a bug directly in production environment. It’s just the same for “automated tests”, a standard in most .Net/Java/Python development but not so common as soon as you go to the data side (Sql, Olap, Etl, …).

Test automation is the use of a dedicated software to control the execution of tests and the comparison of actual outcomes with expected outcomes. All the steps involved in this process of tests’ execution are handled by a software. It means that a software will prepare the environment, execute the tests and also decide if these tests are positive or not.

The big advantage of test automation is repeatability. You don’t care how many times you’ll need to execute the tests, it will be done by a software (usually lowering the costs of testing sessions). But you’ve a few other advantages not so obvious such as  the lake of distraction and human mistakes. To illustrate this, think about the last time you’ve compared two columns of 25 figures. If one figure is not the same than it’s vis-a-vis, are you sure you’ll spot it? Even if you’ve already  made this comparison 10 times with success? A software doesn’t suffer from distraction or lassitude.

no-automation

A lot of people claim that test automation is time expensive. It’s true that the implementation of tests will surely have a higher cost with a strategy of automation than with any other strategy. But you’re usually implementing your test once and executing them several times.The cost involved during the execution of an automated test is at zero (or close to it). The price of an automated test-suite will become affordable if you plan to execute your tests more than a few times. Testing often is a prerequisite for a quality software. Test as often as possible and it’ll save you time. Time is also generally a concern for people avoiding to automate their test-suite. They are in the rush and want to save time. For the I only have one answer, you don’t have time because you’ve not automated your test-suite.

Testing a cube with or without automation is not different. You’ll need to proceed to the same tests: cube structure, dimensions’ members and finally measures. Other points such as security, performances, load, should also be tested but are out-of-scope for this post.

When you test the structure of a cube, you should check if all expected dimensions, hierarchies, measures and so on are there. It’s also important that you’ve no unexpected artifacts. You should also ensure that the relations between dimensions and facts are effectively available for end-users.

About dimensions’ members, you’ll be interested to assert their availability and validate their ordering (alphabetically or numerically or chronologically or specific). To check the availability of members, you’ll usually compare the content of your cube with your source system or with your data warehouse. But it’s not necessary the smarter idea because you’ll have to implement in your test all the rules written in your etl or in the security layer and filtering the members visible for the end-user. It could be tricky and error-prone. It’s sometimes smarter to ask your stakeholders what are the members that they are sure they want to show and how much they are expecting. Based on this information, it’s possible to ensure that the members given by your stakeholders are effectively visible in your cube and that the actual count of members is indeed close to the figure given by your stakeholders.

About measures, you need to check that aggregations are correct (not a sum in place of an average), that they are correctly linked to the expected members for each dimension and that all calculations are correctly based on other available facts. You’ve really a huge diversity of tests to think about. For your reference, again, the source application or the data warehouse are interesting but don’t forget also to check the likelihood with stakeholders. They probably have good ideas about ranges for all the figures they are requesting, you could find some existing reports on these facts and these reports are also a good source of expectations, if you introduce a tolerance.

Now that you’ve an idea about what to test, you could ask … “how to effectively implement this automation”! For test automation, you’ll need a framework. The not-smart-option will be to try to implement one by yourself: these frameworks are complex and require good testing. So it’s probably better and safer to use one of the frameworks available on the net. The surprise could be that there is not a lot of choice. Indeed as explained before this kind of practice (test automation) is unfortunately not so common in the data world.

For next blog posts, I’ll use NBi to show you some samples. This framework is open-source and free (as a beer or a speech), actively maintained, documented and supports (without tricks or hacks) all the tests described above. More, you don’t need to write some code in java or python or c#, just some xml tags are enough to implement your tests. You don’t need a compiler either. I’d recommend that you try to download it and install it (check its own documentation for this).

In the next posts, I’ll cover some use-cases such as comparison of queries’ result set to static values or source systems, to give you some guidance about test automation for an SSAS cube with NBi.