Month: November 2014

Scope and many-to-many to solve complex business analysis: first case

This week, I had an interesting problem to solve with SSAS and I wanted to share it (and its solution).

Business description

We’ve three countries (Belgium, Netherlands, France) with two borders (Belgium/France and Belgium/Netherlands). We’ve also two additional borders (Spain/France and Germany/Netherlands) but the countries Spain and Germany are not part of the solution to model.
CountryBorder

An oriented border is a representation of a border in one direction: it means that the border Belgium/France has two corresponding oriented borders Belgium to France and France to Belgium. For the border Germany/Netherlands, we only have one oriented border Germany to Netherlands. On each oriented border I’ve a capacity to import something.
CountryOrientedBorder

External borders of a subset of countries are borders with at least one of their countries not listed in the set. Sample: for the subset with Belgium and Netherlands, the borders Belgium/France and Netherlands/Germany are external, on the other hand, the border Belgium/Netherlands is considered as an internal border.
InternalExternalBorder

Expected analysis

  • List oriented borders belonging to a set of countries and their corresponding capacity of import
  • List external borders of a set of countries and their corresponding capacity of import
  • The capacity of import of a set of country is equal to the sum of capacity of import for all its external borders.

The tricky part of the requirement is about external and internal borders for a set of countries. This notion of external/internal depends of the set of countries and so can’t be pre-calculated.

Here under, I’ll define one solution: it’s surely not the only solution to answer to analysis needs but at least this version is correct and is efficient.

Solution

The model has a first dimension named Country with an attribute Country.

CREATE TABLE [dbo].[Country]
(
    [CountryId] INT NOT NULL PRIMARY KEY IDENTITY, 
    [Country] VARCHAR(50) NOT NULL
)
GO
insert into BiddingZone VALUES('Belgium');
insert into BiddingZone VALUES('France');
insert into BiddingZone VALUES('Netherlands');

Then a second dimension Border is added. This dimension has two attributes: Border and Oriented Border. Oriented borders will be displayed with an arrow between the two codes of the countries (fr->be). Borders will be displayed with a slash between them (be/fr).

CREATE TABLE [dbo].[Border]
(
    [BorderId] INT NOT NULL PRIMARY KEY IDENTITY, 
    [Oriented Border] VARCHAR(10) NOT NULL
    [Border] VARCHAR(10) NOT NULL, 
)
GO
insert into Border VALUES('be->fr', 'be/fr');
insert into Border VALUES('fr->be', 'be/fr');
insert into Border VALUES('be->nl', 'be/nl');
insert into Border VALUES('nl->be', 'be/nl');
insert into Border VALUES('de->nl', 'nl/de');
insert into Border VALUES('fr->es', 'fr/es');

The facts are linked to the border dimension at the grain of the oriented border.

CREATE TABLE [dbo].[Import]
(
    [BorderId] INT NOT NULL, 
    [Import] INT NOT NULL
)
GO
insert into Import VALUES(1, 100);
insert into Import VALUES(2,110);
insert into Import VALUES(3, 70);
insert into Import VALUES(4, 75);
insert into Import VALUES(5, 137);
insert into Import VALUES(6, 68);

Finally, we need to create a factless fact between the Countries and the Borders. The idea is to model the factless fact as the link between the oriented border and the corresponding destination country.

CREATE TABLE [dbo].[BorderCountryAssociation]
(
    [BorderId] INT NOT NULL, 
    [CountryId] INT NOT NULL
)
GO
insert into Import VALUES(1, 2); --be->fr , fr
insert into Import VALUES(2, 1); --fr->be , be
insert into Import VALUES(3, 3);
insert into Import VALUES(4, 1);
insert into Import VALUES(5, 3);
insert into Import VALUES(6, 2);

This is the structure and content of your underlying database. Now we can start the cube section, by creating two dimensions (Country and Border). … trivial.

Then, we add to the DSV a first fact table BorderCountryAssociation with an additional column (set to 1). This additional column will be named Border count. We’ll see the need later.

The fact table Import is added to the DSV and we can create the matrix of the dimension usage with a Many-to-many.

Many2Many

At this moment, most of the analysis needs are already fulfill. The remaining requirement is the problem of internal and external borders. Based on our model an internal border is a border appearing twice in the table BorderCountryAssociation! The field previously added named Border count will count how many times the border appears in the subset defined by the countries. If it’s more than once then the value of import should be set to null to avoid to take it into account.

To modify the value of the fact Import based on the value of the fact Border Count, when we’re at the level of the Border, we’ll use a scope statement. Previously, we’ll add a named calculation (“Import”) in the DMV (table Import). This named calculation will be initially set to NULL.

CALCULATE; 
SCOPE([Measures].[Import]); 
    SCOPE([Border].[Border].[Border].MEMBERS); 
            THIS = IIF([Border Count]>1, NULL, [Measures].[Import db]); 
    END SCOPE; 
END SCOPE;

This action has solved the problem for the internal borders but unfortunately it has broken the solution for the oriented borders. In order to solve both issues, we need to create an additional scope statement for the oriented border level.

CALCULATE; 
SCOPE([Measures].[Import]); 
    SCOPE([Border].[Oriented Border].[Oriented Border].MEMBERS); 
        THIS = [Measures].[Import db]; 
    END SCOPE; 
    SCOPE([Border].[Border].[Border].MEMBERS); 
            THIS = IIF([Border Count]>1, NULL, [Measures].[Import db]); 
    END SCOPE; 
END SCOPE;

The corresponding result matches with the initial requirements.Border and Oriented Border

Evaluate an expression to validate the result of a query

During the validation process of your BI solution, it could be useful to check that a given expression is validated for each row returned by a query. A sample? Yes surely, you’ve a query returning all the customers having bought for more than $10,000 in your stores. You should ensure that the query returns only rows with a column TotalDue greater than 10000. If we apply this query to AdventureWorks2012, it should look as

select
	[FirstName] + ' ' + [LastName]
	,TotalDue
from
	[Person].[Person] pers
inner join
	(
		select
			[pers].[BusinessEntityID]
			,sum(TotalDue) as TotalDue
		from
			[Sales].[SalesOrderHeader] soh
		inner join
			[Sales].[Customer] cust
			on [cust].[CustomerID] = [soh].[CustomerID]
		inner join
			[Person].[Person] pers
			on [pers].[BusinessEntityID] = [cust].[PersonID]
		group by
			[pers].[BusinessEntityID]
		having
			sum(TotalDue)>10000
	) TotalDueByPerson
	on [TotalDueByPerson].[BusinessEntityID]=[pers].[BusinessEntityID]

The most common assertion for datasets within NBi is equalTo but it can’t help you for this task. Indeed, the assertion equalTo requires that you know exactly the count of rows that will be returned by the query. If you compare two datasets with different count of rows, the test will surely turn red complaining about missing or unexpected rows.

So if equalTo is not the graal, we should take a look to the another assertion dedicated to datasets’ validation: evaluate-rows. The assertion evaluate-rows is checking that a list of expression are validated for each row of the dataset. If at least one of these expressions for at least one row is not validated, then the test will turn red.

The assertion evaluate-rows is mainly built to support calculations on based on values and compare the result of the expression’s evaluation to a specific column. To support the case explained above, we will need to be creative. The idea is to validate the content of the cell TotalDue based on it’s own value. Indeed, we will express in our test that if the value of column TotalDue is greater than 10,000 then we’ll return this value as the the expression’s result, else we’ll return -1. When the framework will compare the value of the column and the value of the expression, each row with a TotalDue greater than 10,000 will validate and if the amount is less than (or equal to) 10,000, this value will be compared to -1 … and this row will fail the test.

<assert>
    <evaluate-rows>
         <variable column-index="1">TotalDue</variable>
         <expression column-index="1">if(TotalDue>10000,TotalDue,-1)</expression>
    </evaluate-rows>
<assert>

Tested with the query above, the test will success.

If we’re changing the query above to filter on TotalDue greater than 50 in place of 10,000

having
 sum(TotalDue)>50

The test will return a failure and turn red with following message:

7263 of the 7309 rows don't validate at least one expression.

Row 0: 
    The expression ' = if(TotalDue>10000,TotalDue,-1)' is not validated.
    The expected result was '-1' but the actual value is '162.7444'
    Variable 'TotalDue' had value of '162.7444'

Row 1: 
    The expression ' = if(TotalDue>10000,TotalDue,-1)' is not validated.
    The expected result was '-1' but the actual value is '3019.6307'
    Variable 'TotalDue' had value of '3019.6307'

Full test:

<?xml version="1.0" encoding="utf-8" ?>
<testSuite name="Acceptance Testing: members ordering" xmlns="http://NBi/TestSuite">
  <test name="Blog Post: Evaluate's expression">
    <system-under-test>
      <execution>
        <query connectionString='Data Source=mhknbn2kdz.database.windows.net;Initial Catalog=AdventureWorks2012;User Id=sqlfamily;password=sqlf@m1ly'>
          <![CDATA[
          select
          [FirstName] + ' ' + [LastName]
          ,TotalDue
          from
          [Person].[Person] pers
          inner join
          (
          select
          [pers].[BusinessEntityID]
          ,sum(TotalDue) as TotalDue
          from
          [Sales].[SalesOrderHeader] soh
          inner join
          [Sales].[Customer] cust
          on [cust].[CustomerID] = [soh].[CustomerID]
          inner join
          [Person].[Person] pers
          on [pers].[BusinessEntityID] = [cust].[PersonID]
          group by
          [pers].[BusinessEntityID]
          having
          sum(TotalDue)>50
          ) TotalDueByPerson
          on [TotalDueByPerson].[BusinessEntityID]=[pers].[BusinessEntityID]
          ]]>
        </query>
      </execution>
    </system-under-test>
    <assert>
      <evaluate-rows>
        <variable column-index="1">TotalDue</variable>
        <expression column-index="1" type="numeric" tolerance="0"> = if(TotalDue>10000,TotalDue,-1)</expression>
      </evaluate-rows>
    </assert>
  </test>
</testSuite>