Month: December 2016

Requirements for BI solutions

A recurrent question that I received after a conferences or during a course sounds really simple but is rather complex to answer: How do you write specifications for Business Intelligence solutions? That will be the starting point of this series of four blog posts about specifications, requirements, work-items and test-cases.

The methodology that I advocate is very business focused. We need to understand business processes, objectives, analytic requirements and associated potential business impact, and eventually project success criteria.

A global Business Requirements deliverable typically covers the following topics: critical business and analytic description including semantic, existing systems and data to support the capabilities required, initial project success criteria, and key issues/risks with tactical and strategic recommendations.

Business Process to be analyzed

Keep in mind that the requirements are not there to define the solution but to explicit the problem. Following this idea, don’t try to model by yourself a report, but describe your business process(es) and your analytical needs. Don’t ask for a report but describe the level of maturity of your end-users and what they will do the data/information. Keep in mind that a BI solution could go further than a data warehouse, a cube or a report but can also be trainings, coaching/support and even new staff members.

I don’t care about the format of the requirements. It could be written on any media that will be readable and shareable in the company. But, keep in mind that implementers should be able to read this document from A to Z. Avoid to write documentation based on hyperlinks as the main mechanism to go from one section to another. It’s great to have hyperlinks to facilitate navigation on some parts of the document, but keep in mind that at any moment they need to be sure about what has been read and what has not been. Based, on this we do not recommend to write your requirements as a website but better on a word document.

Strictly avoid UML or E/R schemas (or any software related notation). Firstly, the level of expertise, with the notation, of each document’s reader/writer will vary, resulting in different and incorrect interpretations of a same schema. Some readers/writers will understands some details, some will not. Keep also in mind that these notations are usually designed to build low level specifications. BI Requirements must stay at the higher level. Last but not least requirements will be validated by stakeholders do you suppose that your CxO is understanding the E/R notation?

A draw is more explicit than a paragraph” is a sentence that I valuate. But I usually add that “a draw without explanations has no value”. If you put some schemas or drawing be sure that they don’t provide information that are not explicitly written in the paragraphs surrounding them. As a guideline, keep in mind that I’ll not read the schemas and draws. They will just be used to double-check my own understanding. Do the same!

The first section of your document should be a large introduction to your business. We’re dealing with BI, we can’t be expert of each process of the company so you need to provide us enough information about the processes without having to read pages of details. The first section deals with this idea: a description of the business process to be analyzed with the BI solution.

This section of the document should explain the Business Process that you want to analyze through Business Intelligence. I just put the stress, I’m not interested, at this step, to know “how you’ll analyze” but I’m focused on “What will be analyzed”. The existing Business Process generating data that we’ll transform to provide some information that will be analyzed.

If you want to analyze the relation between the customers’ budget and the sales then I’m expecting that you describe “how customers are buying things with your website”, “how customers are buying in your shops” and “how we get information about their budget”. In other words, what are the different activities of the Business processes “Internet Sales” and “Shop Sales” including especially but not exclusively the activities “CRM” and “Order”.

This is really a Business Processes description that I’m expecting and not an application process. At this moment, information about software supporting the business processes are more a distraction than really helpful.

This part of the document is critical, the more correct and exhaustive this part will be, the faster we’ll be able to start the development.

Define the terms

Vocabulary or semantic is something essential in Data Modelling and so Business Intelligence. You need to provide a definition of terms that are used in the Business Processes described previously.

This vocabulary will be also reused in the resulting BI Solution. So if you’re using the word “Customer” in your requirement we’ll use “Customer” and do not expect “Client” in the implementation.

Avoid usage of synonyms. It’s probably a better writing style to make usage of them, but it’s also ambiguous. If a “Customer” and a “Client” have the same meanings, choose one of them and use it until the end of the document. I’m not expecting a Goncourd or Nobel price of literature but clear business requirements.

Be explicit about your definition. I don’t expect and are not interested by a list of items with two words standing for a definition. Give us a context and rules to apply to validate your vocabulary.

Example: Define “customer” for a hotel chain

Possibilities include:

  • A customer is someone who makes a reservation
  • A customer is someone who has had at least one stay at a hotel
  • A customer may be a business or individual who rents a hotel facility
  • A customer is someone who signs up for the hotel loyalty program

Candidate definition:

A customer is an individual, group, or business who has received one or more hotel services, including hotel stays, facility rentals, and/or other additional hotel services, such as spa services. Hotel services may or may not have been paid for by the customer. A customer does not have to complete a hotel stay to be considered a customer.”

It’s usually easier to understand and assimilate if the definitions are given in-text with the business process description. I do not recommend a dictionary or glossary disconnected of the business process description. It’s usually difficult to maintain and difficult to read.

Often, I’m receiving documents where the semantic in the drawings and the text is not the aligned. It’s really confusing. Even if you’re copy-pasting schemas from other documents ensure that the semantic is the same than in other parts of your document. If, for editorial reasons, you can’t update the text on the drawings then clearly state that the terminology is not identical and give clear correspondences.

Analytics targets

In the second section of the document, we’re dealing with the analytics that should be conducted on top of the previously described Business Processes.

The topic is not only what we want to analyze but also about the skills of the users. The more-skilled the users are, the more degree of freedom we’ll offer. But if your users have absolutely no IT knowledge and no time, we’ll probably deliver a report in their mailbox.

 Users: skills, mobility and audience

Typically, users will fall into one of three categories in relation to the timeliness of data required:

  • Report broadcasts and scheduling: Many BI users won’t require real-time data access. For example, the marketing department might rely on weekly sales reports to gauge the effectiveness of brand strategies and specific marketing campaigns over a significant period of time. Daily, or real-time reports – such as transaction level reports – would provide too much detail. The strength of BI in this instance is in its ability to track and identifying trends over time, while simultaneously allowing users to connect shifts in the data to real-life events (marketing campaigns) to support strategy development.
  • Alerts: In some instances, certain user groups won’t need to actively monitor business activities, but just the ability to be alerted to abnormal or unacceptable circumstances – when key metrics fall outside predefined parameters. For example, the accounts payable team don’t need constant data access to perform their daily tasks. However, if an invoice is set to pass its payment deadline, an alert can grab attention to ensure that it’s processed in a timely manner.
  • On-demand data access: Some users will require the ability to directly access and search real-time, or near real-time, operational data and reports, as well as perform ad-hoc queries. For example, BI users coordinating the intersection of ‘just in time’ manufacturing and supply chain operations would demand the ability to access, analyze and act on up-to-the-minute reports and data to ensure the timely delivery of stock from the warehouse, to transport operators, and to retail outlets.

Be sure to give a correct overview in terms of skills, mobility and point-in-time of the users that will use the solution. It’s expected that you’ve more than one type of users. Different categories of users will access different artefacts of the solution.

Analytics needs

Articulate your reflection in terms of question that will be asked. Don’t stop on the first question. If users tell you that they want to “Monitor Quality”, go further ask them to explain how they will define “Quality”. Is the difference between two measures in percentage or absolute value? Ask them what they will effectively monitoring? Will they check if the quality improves? Do they want to receive alerts when the quality decrease? Is just one measure bellow the expectation enough to ring the bell of the alerts?

It happens that users have no idea for these questions. They just want the raw data. In such case, they also ask to be able to define a lot of parameters by themselves. It’s a current and normal situation. It probably means that you need to add a phase of data discovery in your business intelligence path. Self-Service BI is in the BI Toolset to provide some data discovery features to end-users.

You don’t need to be especially exhaustive in the definition of the analytics needs. BI models have the ability to always answer to a family of question and not a unique question. So if you’ve forgotten an analysis in your requirements, you’ve a probability that you’ll still be able to answer this question with existing models. But it’s only a probability, other families of analytic needs will require different models. So it’s probably better to describe non-similar needs.

Business Impact

The last part of the document is optional. We can build a solution without it. Anyway, I’ll still be interested and will surely read it in detail. With this section, I’ll probably provide a better solution and share your business objectives.

This part of the document is more important for the stakeholders to have some support when requesting a budget. It explains what are the expected return on investment and especially how do you think they will be achieved. What are the changes in the business processes that you’ll support with this new application? How these information needs map to the business goals?

Explain me your problem, I’ll find a solution.

A recurrent question that I received after a conferences or during a course sounds really simple but is rather complex to answer: How do you write specifications for Business Intelligence solutions? That will be the starting point of this series of four blog posts about specifications, requirements, work-items and test-cases.

A key point is what do you mean by specifications? I tend to agree with this definition from Business Dictionary:

Exact statement of the particular needs to be satisfied, or essential characteristics that a customer requires (in a good, material, method, process, service, system, or work) and which a vendor must deliver. Specifications are written usually in a manner that enables both parties (and/or an independent certifier) to measure the degree of conformance. They are, however, not the same as control limits (which allow fluctuations within a range), and conformance to them does not necessarily mean quality (which is a predictable degree of dependability and uniformity). […]

In a few words, specifications let you measure conformance not quality.

IMHO, they are pointless. Do you really want to open a debate with your customer or stakeholder by the following sentence: “Yes, we’re delivering bullshit but it’s 100% conform to the specifications!”. Personnaly, not. And it’s independent about who is writing the specifications or if they have been validated or not. Who cares? Seriously, will you accept specifications from your plumber? Are you sure to understand 10% of his technical wording? You don’t care about all the documents, you want a good solution. A better way to collaborate: you explain your problem to him and he’ll find and implement a good solution.

You’ve two kind of issues with specifications:

  • They don’t describe the problem but a potential solution. These requirements don’t let some latitude to the business intelligence specialists to design the best models and visualizations. Generally the resulting development is not performant and sub-optimal and overly complex, leading to a solution not accepted by end-users.
  • They do not provide enough specificity to guide development of the BI databases and applications that deliver the BI or to guide the business process changes that deliver the bang for the buck to the business.

For the second kind of specifications, it’s usually easy to explain to your stakeholders that you can’t come to a good solution with generic sentences. But it’s more tricky with the first kind of specifications explained above. They will tell you that the solution is already explained and you just have to implement. From my experience, don’t accept this kind of mission. Explain them that we’re BI professionals, it’s up to us to understand the needs of our customers and to deliver solutions with a high quality. It’s not their responsibility to explain how they would like to solve their own issue.

So if I don’t like specifications, what am I recommending to write? Requirements, Work items and test-cases. But that’s for a few other blog posts starting by this one.

Tracking security changes on your SQL Server development environment

A few months ago, I was fed up to receive complains from my teammates about security settings changed on the development (or test) server. Every week, someone was at my desk complaining with a variation of “My development was working previous week and now it’s not working anymore. Someone (not me) must have change something about security but I don’t know who and why!”.

To give you a correct picture about our SQL Server environment, we’ve one instance for the dev server and another the test server but many people are developing on a given project. Consider also that we’re developing different projects at the given moment. All team members are (full) administrators of the dev and test servers. All team members know that the rule about security is “reduce privileges to the minimum required” (definitively db_owner is not a solution) and this rule will be checked during code reviews.

Some will consider a classical approach, supported by most DBA: we should restrict the administrator rights to less people: a kind of single-point-of-security. Well, it was probably my opinion seven years ago but my position has dramatically changed on this. This point of view is really based on my own environment … your mileage will vary but keep in mind that I’m speaking at the dev and test level not production. Main point for this position is that I’d not recommend a “single-point-of” for any agile development. The true names are not “single-point-of” but “bottleneck”, “not-shared-decisions”, “I’m-the-only-one-who-know-don’t-fire-me” and “not-my-business-ask-this-guy” … four things that I’m trying to avoid as much as possible.

I implemented a solution to track any change to “security” on the SQL Server. Not a word to my team members … I was the only one to know that something has been scripted and deployed. After a few weeks, my conclusion was clear: nobody in the team is trying to sabotage the work of others. After investigations, issues could be grouped as

  • You changed the code and it’s now requiring new privileges: Script them and stop blaming others.
  • During development, you manually added some privileges to the database. But the deployment script is not including all these privileges. Someone has redeployed the solution from scratch, discarding all the unscripted privileges. It’s your mistake.
  • Less frequent … indeed, someone on another solution is also developing on this shared component (SSISDB) and has a different definition than yours for this shared role. Discuss together and align your needs.

So clearly it was not a question of responsibility or trust but a problem of practices:

  • All privileges must be scripted (and reason to request them documented) and committed in the solution. It must be part of the deployment artefact (at least in dev/test).
  • All the work performed on SSISDB (or any shared component) must be executed on a shared project

That’s not enough, practices not supported by tools are pointless and will never be applied during more than a month. We must provide a tool for people to know who has changed “security”. The following trace is doing the job:

use [master];
go

create server audit
    [Audit Login Changes]
to file
(   FILEPATH = N'N:\Trace\'
    , MAXSIZE = 1024 MB
    , MAX_FILES = 10
    , RESERVE_DISK_SPACE = OFF
)
with
(   QUEUE_DELAY = 1000
    , ON_FAILURE = CONTINUE
    , AUDIT_GUID = '33e055a9-1924-47c3-9798-e8875a5bcd42'
);

alter server audit
    [Audit Login Changes]
with (state=on);
go

create server audit specification
    [Audit Login]
for server audit [Audit Login Changes]
    add (DATABASE_ROLE_MEMBER_CHANGE_GROUP)
    , add (SERVER_ROLE_MEMBER_CHANGE_GROUP)
    , add (DATABASE_PERMISSION_CHANGE_GROUP)
    , add (SERVER_OBJECT_PERMISSION_CHANGE_GROUP)
    , add (SERVER_PERMISSION_CHANGE_GROUP)
    , add (DATABASE_PRINCIPAL_CHANGE_GROUP)
    , add (SERVER_PRINCIPAL_CHANGE_GROUP)
    , add (SCHEMA_OBJECT_PERMISSION_CHANGE_GROUP)
with (state=on);
go

and the following query will help you to investigate the traces. Since this instant, I just received a few complains and it was about Windows (disk access for read/write files with SSIS) issues.

declare  @LogFilePath varchar(255);

select
	@LogFilePath=log_file_path
from
	sys.server_file_audits
where
	name='Audit Login Changes';

set @LogFilePath=@LogFilePath +'*.sqlaudit';

select
	session_server_principal_name
	,database_name
	,object_name
	,m.class_type_desc
	,statement
	,convert(datetime,
		switchoffset(
			convert(datetimeoffset, event_time)
			, datename(TzOffset, sysdatetimeoffset())
		)
	)
from
	sys.fn_get_audit_file (@LogFilePath,default,default) f
inner join
	sys.dm_audit_class_type_map m
	on m.class_type=f.class_type
order by
	event_time;

I know that if someone wanted to hijack this trace it would be easy to stop it (and restart it), but keep in mind that I’m not trying to fix a security issue but to fix a process issue during the development.

Using MsBuild tasks to build SSIS packages

In a previous blog post, I explained how to recompile the MsBuild task for SSIS 2016. Now that we’ve this task, we’ll see how to use it.

First,let define a new file (here named Build.proj) in our SSIS solution

Then, we’ll need to define our headers and namespaces:

<Project
    xmlns="http://schemas.microsoft.com/developer/msbuild/2003"
    DefaultTargets="SSISBuild"
/>

The last attribute DefaultTargets is defining the part of the xml file that will be used when this file will be executed. In our case we’ll just have one version of the Target but it’s better to already correctly define it.

We also have to specify to MsBuild that an external assembly will be used (the community tasks to build SSIS packages).

<UsingTask
    TaskName="DeploymentFileCompilerTask"
    AssemblyFile="..\SSISMSBuild\SSISMSBuild\bin\Debug\SSISMSBuild.dll"
/>

It’s now time to create a kind of variable to host the name/path of our SSIS project. Within MsBuild you achieve this task by using the following syntax:

<ItemGroup>
    <SSISProjPath Include="..\anotherFolder\MySSISProject.dtproj" />
</ItemGroup>

SSISProjPath is the name of the variable and the value is defined in the Include attribute.

Now, let’s focus on the Target itself. The target is named SSISBuild and will display a message before using the task to compile the SSIS project. The second step is performed by invoking the task DeploymentFileCompilerTask

<Target Name="SSISBuild">
   <Message Text="*********** Building SSIS project ***********"/>
   <DeploymentFileCompilerTask
       InputProject="@(SSISProjPath)"
       Configuration="Development"
       ProtectionLevel="DontSaveSensitive"
   />
</Target>

The first parameter of DeploymentFileCompilerTask is the name of your project, you can safely use your variable by specifying the @() syntax. The second parameter is the configuration, personnaly I never change it and use Development but up-to-you. The protection level will override the corresponding property of your project.

If you want to compile, your project you can call MsBuild and specify Build.proj as the parameter, it will do the job! But it’s more interesting if you can include this in your build steps. I’ll just show you how you can include this in your TeamCity configuration:

  1. Create a build step
  2. Define the runner type as MsBuild
  3. Specify the MSBuild property for version as Microsoft Build Tools 2015 and ToolsVersion as 14.0
  4. Don’t forget to specify the name of your target
    as SSISBuild
ssis-build-teamcity

Once you’ve this setup in place, you can trigger your TeamCity project to get the following message and your build ready.

ssis-build-teamcity-output

Building the MsBuild Tasks for SSIS 2016

MsBuild Tasks are a set of two tasks available in the community project supported by Microsoft (project SSISMSBuild) and hosted on Codeplex. They aim to support the build of SSIS packages from MsBuild files. This a great way to automate the build of your packages without the need of Visual Studio (and so without licences issues). Unfortunately, this project has not been updated since 2010 but the source code is still available. It means that you need to update the dependencies and compile these tasks if you want to use it with SQL Server 2012, 2014 or 2016 (or vNext).

Download the solution and open it with Visual Studio:

ssis-build-projects

To “upgrade” for SSIS 2016, open the solution with Notepad++ or any other text editor. Change the hint paths (not just once but all of them) to redirect to your version of Visual Studio:

  • Visual Studio 11.0  = Visual Studio 2012
  • Visual Studio 12.0  = Visual Studio 2013
  • Visual Studio 14.0  = Visual Studio 2015
  • Visual Studio 15.0  = Visual Studio 2017
ssis-build-hint-path

Then, it’s time to update the references to SQL Server in the project. Check the path of the dll named Microsoft.SqlServer.ManagedDTS. Keep in mind the following matches:

  • v11 = SQL Server 2012
  • v12 = SQL Server 2014
  • v13 = SQL Server 2016
  • v14 = SQL Server vNext

.ssis-build-references

You also need to sign your build. Don’t panic it’s really easy. Open the properties of your project and create a new key:

ssis-build-project-properties

You must update the target framework to be inline with the framework of SSDT … currently .NET framework 4.5.

ssis-build-project-framework

Build and finally copy/paste your new dll to the directory C:\Program files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\PrivateAssemblies\ (If needed change the path to target the correct version of Visual Studio).

That’s it, you now have correct version of the MsBuild tasks for SSIS, in a next blog post, I’ll explain how to use it.

Create a syntax highlighter for Visual Studio and Visual Code with TextMate Bundle

Yesterday, I decide to create a syntax highlighter for the genbiL language. The end-result is available on this GitHub repository. The goal was to provide a better experience for users editing these files from their favorite code editor. It was a completely new experience for me and it appeared to be much easier than expected.

screenshot

if you’re using genbiL, I recommend that you install it through the following commands:

cd %userprofile%\.vs\Extensions
git clone git://github.com/Seddryck/genbil-tmbundle genbil

Visual Studio and Visual Code use an international standard for syntaxes and snippets: TextMate Bundles. If you want to create a new TextMate Bundle for a given language, start by creating a new folder under %userprofile%/.vs/Extensions with the name of your language and then a subfolder named syntaxes.

In this folder create a new empty file or (much more easier) copy/paste an existing file from another language. The format of this file is a property list, an xml-like file with just a few elements:

  • <key/> will specify the type of content for the next element
  • <string/>specify the value associated to the <key/> defined above
  • <array/>is a container to specify multiples values for the <key/>
  • <dict/>let’s you specify a <key/> and <string/> pair

It sounds weird at the beginning but it’s relatively easy to use.

First action to take is to change or write a few key information:

  • uuid is a unique identifier for this syntax definition. Each new syntax definition must have its own uuid. Don’t re-use them or your editor will not be able to load your syntax.
  • The name of your language. Use a short, descriptive name. Typically, you will be using the programming language’s name you are creating the syntax definition for.
  • The top level scope for this syntax definition is the scopeName. It takes the form source.languageName
  • fileTypes is a list of file extensions. When opening files of these types, Visual Studio will automatically activate this syntax definition for them.
<key>uuid</key>
<string>2BA75B32-707C-11D9-A928-000D93589AF6</string>

<key>name</key>
<string>GenbiL</string>	

<key>scopeName</key>
<string>source.genbil</string>

<key>fileTypes</key>
<array>
   <string>genbil</string>
</array>

After this initial definition, you’ll have to create patterns. Each pattern starts by a match. A match is defined through a regular expression.

Then comes the name. In fact the name is a naming scope and isn’t obvious sometimes. Check the Textmate online manual for guidance on scope names. It is important to re-use the basic categories outlined there if you want to achieve the highest compatibility with existing themes.

Colors from the theme have hardcoded scope names in them. They could not possibly include every scope name you can think of, so they target the standard ones plus some rarer ones on occasion. This means that two colors using the same syntax definition may render the text differently!

Bear in mind too that you should use the scope name that best suits your needs or preferences. It’d be perfectly fine to assign a scope like constant.numeric to anything other than a number if you have a good reason to do so.

<key>areas</key>
<dict>
   <key>patterns</key>
   <array>
      <dict>
         <key>match</key>
         <string>\b(case|template|setting|suite)\b</string>
         <key>name</key>
         <string>storage.type.source.genbil</string>
      </dict>
   </array>
</dict>