I’ve never really understood why data-oriented people are so slow and unwilling to put into action the best practices from traditional IT. I mean, in 2014, it’s not rare to find a database’s code not protected by a source control or to find an etl-developer fixing a bug directly in production environment. It’s just the same for “automated tests”, a standard in most .Net/Java/Python development but not so common as soon as you go to the data side (Sql, Olap, Etl, …).
Test automation is the use of a dedicated software to control the execution of tests and the comparison of actual outcomes with expected outcomes. All the steps involved in this process of tests’ execution are handled by a software. It means that a software will prepare the environment, execute the tests and also decide if these tests are positive or not.
The big advantage of test automation is repeatability. You don’t care how many times you’ll need to execute the tests, it will be done by a software (usually lowering the costs of testing sessions). But you’ve a few other advantages not so obvious such as the lake of distraction and human mistakes. To illustrate this, think about the last time you’ve compared two columns of 25 figures. If one figure is not the same than it’s vis-a-vis, are you sure you’ll spot it? Even if you’ve already made this comparison 10 times with success? A software doesn’t suffer from distraction or lassitude.
A lot of people claim that test automation is time expensive. It’s true that the implementation of tests will surely have a higher cost with a strategy of automation than with any other strategy. But you’re usually implementing your test once and executing them several times.The cost involved during the execution of an automated test is at zero (or close to it). The price of an automated test-suite will become affordable if you plan to execute your tests more than a few times. Testing often is a prerequisite for a quality software. Test as often as possible and it’ll save you time. Time is also generally a concern for people avoiding to automate their test-suite. They are in the rush and want to save time. For the I only have one answer, you don’t have time because you’ve not automated your test-suite.
Testing a cube with or without automation is not different. You’ll need to proceed to the same tests: cube structure, dimensions’ members and finally measures. Other points such as security, performances, load, should also be tested but are out-of-scope for this post.
When you test the structure of a cube, you should check if all expected dimensions, hierarchies, measures and so on are there. It’s also important that you’ve no unexpected artifacts. You should also ensure that the relations between dimensions and facts are effectively available for end-users.
About dimensions’ members, you’ll be interested to assert their availability and validate their ordering (alphabetically or numerically or chronologically or specific). To check the availability of members, you’ll usually compare the content of your cube with your source system or with your data warehouse. But it’s not necessary the smarter idea because you’ll have to implement in your test all the rules written in your etl or in the security layer and filtering the members visible for the end-user. It could be tricky and error-prone. It’s sometimes smarter to ask your stakeholders what are the members that they are sure they want to show and how much they are expecting. Based on this information, it’s possible to ensure that the members given by your stakeholders are effectively visible in your cube and that the actual count of members is indeed close to the figure given by your stakeholders.
About measures, you need to check that aggregations are correct (not a sum in place of an average), that they are correctly linked to the expected members for each dimension and that all calculations are correctly based on other available facts. You’ve really a huge diversity of tests to think about. For your reference, again, the source application or the data warehouse are interesting but don’t forget also to check the likelihood with stakeholders. They probably have good ideas about ranges for all the figures they are requesting, you could find some existing reports on these facts and these reports are also a good source of expectations, if you introduce a tolerance.
Now that you’ve an idea about what to test, you could ask … “how to effectively implement this automation”! For test automation, you’ll need a framework. The not-smart-option will be to try to implement one by yourself: these frameworks are complex and require good testing. So it’s probably better and safer to use one of the frameworks available on the net. The surprise could be that there is not a lot of choice. Indeed as explained before this kind of practice (test automation) is unfortunately not so common in the data world.
For next blog posts, I’ll use NBi to show you some samples. This framework is open-source and free (as a beer or a speech), actively maintained, documented and supports (without tricks or hacks) all the tests described above. More, you don’t need to write some code in java or python or c#, just some xml tags are enough to implement your tests. You don’t need a compiler either. I’d recommend that you try to download it and install it (check its own documentation for this).
In the next posts, I’ll cover some use-cases such as comparison of queries’ result set to static values or source systems, to give you some guidance about test automation for an SSAS cube with NBi.