Merge on a filtered set of rows without creating a new table (Power BI).

In a previous blog post, I wrote about the analysis of survey’s results and explictely how to build a hierarchy when this notion was not really transparent into the source data.

I’ll now continue my journey and explain how to build the table/dimension “respondent”. To summarize the challenge that I’ll face:

  • Each respondent means a new column (and not a new row), each additional information for the respondent add a new row (and not a new column) [Orange area]
  • Some information are available in the list of answers given by the respondent and not in the information on the top [yellow area]
  • For maintenance reasons, I’ll try avoid to create temporary tables that will only be used as a subset of another table
survey-excel-file

The first step is really easy, we’ll just keep the first rows by removing all the rows where the column “A” has a value. when done, we can remove this column that will only contain null values.

let
   Source = Source,
   #"Filtered Rows" =
      Table.SelectRows(
          Source,
          each ([Column1] = null)
      ),
   #"Removed Columns" =
      Table.RemoveColumns(
           #"Filtered Rows",
           {"Column1"}
      ),

The next step is to transform each row into a column and vice versa. Usually people with a SQL background try to apply a pivot or unpivot to achieve this task. There is an easiest way with the M language … just use the function Transpose

#"Transposed Table" = Table.Transpose(#"Removed Columns"),

You’ll obtain the following result:

respondent-transpose

It’s now to promote the first row as header.

#"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table"),

Now that we already have a few information about our respondent, we’ll need to add the block of yellow information to this table.

To achieve this, I’ll first create a new table named Result-Staging and containing the gray area of my excel sheet. Let’s start by removing the column “B”, promoting the first row as header and filter rows to remove them within the orange area (column “A” is null).

let
    Source = Source,
    #"Removed Columns" = Table.RemoveColumns(Source,{"Column2"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Columns"),
    #"Filtered Rows" = Table.SelectRows(
        #"Promoted Headers",
        each ([Column1] <> null)
    ),
results-gray

Then, I’ll try to change this format to have one column with the QuestionId, another with the Respondent and finally a third with the response. To transform the current data table into this format I’ll need to unpivot all columns except Attribute and Value.

    #"Unpivoted Other Columns" =
        Table.UnpivotOtherColumns(
            #"Filtered Rows",
            {"Column1"},
            "Attribute",
            "Value"
        ),
    #"Renamed Columns" =
         Table.RenameColumns(
              #"Unpivoted Other Columns",
              {
                    {"Column1", "QuestionId"},
                    {"Attribute", "Interviewee"},
                    {"Value", "Answer"}
               }
          )
in
    #"Renamed Columns"
results-unpivot

In this table, only the results with a QuestionId equal to 111 really interest me for a merge with the existing table Respondent. If you’re familiar with the UI of Power BI Desktop then you’ll probably think to create a new table referencing this one then filter on QuestionId equals 111 and finally merge. It’ll work but applying this strategy could result in many “temporary” tables. A lot of these small tables used only for a few steps before merging with other tables tend to be a nightmare for maintenance. You can use the “Advanced formula editor” to not display this kind of temporary tables and embed them in your main table.

On the code of the Respondenttable, add a step to filter the content of the table Result-Staging.

#"Age table" =
   Table.SelectRows(
      #"Results-Staging",
      each [QuestionId] = 111
   ),

If you take a look to the code the function #"Results-Staging" doesn’t exist in the code associated to table Respondent. In fact it’s the end-result of the function/table Results-Staging !

Now that we’ve our filtered set of rows corresponding to the ages of the respondents, I can join it with the content of the table Respondent.

#"Merged Queries" =
   Table.NestedJoin(
       #"Promoted Headers",
       {"Name"},
       #"Age table",
       {"Interviewee"},
       "NewColumn",
       JoinKind.LeftOuter
   ),

With a bit more of expanding and renaming, I can finally get my table Respondent.

respondent-final

In the next blog post, I’ll show how to manage the table of the results.

 

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s