Month: March 2018

Table.Distinct and case pitfalls with Power Query/M language

In a previous post, I already blogged about case pitfalls in Power Query/M language. By default, comparisons are case-sensitive and that could be an issue when sorting list or tables but also when using the List.Distinct and Table.Distinct functions.

To explain the issue, I had the following table:


Note that at the top of the table, I’ve twice the value ADO.NET, once with uppercase and once lowercase for the “Net” part. If I try to apply a Table.Distinct, the two values will be considered as distinct and one of them won’t be removed.


Will it be an issue? If this your key and it’s part of a model where this key is part of one side of a one-to-many then it will be an issue.


The question on the table is how to manage this case? The MSDN documentation informs us that a second parameter can be defined for Table.Distinct

Removes duplicate rows from a table, ensuring that all remaining rows are distinct.

   table as table, 
   optional equationCriteria as any
   ) as table  

But no additional info about what’s the equationCriteria is available on this page. You’ll have more luck if you try to read the documentation for the function List.Distinct. There, you’ve a few samples and you can understand that you’ll be able to use any function but also the Comparer list of functions. The comparers are also documented in MSDN.

In that case, the most obvious choice is to the comparer named Comparer.OrdinalIgnoreCase to be sure that the comparison of the different items won’t be impacted by the difference of cases.


Handling Html entities when parsing web pages with Power Query/M Language

In a previous blog post, I explained how to handle the html entities when they were number-based such as À or &#9827 ;. This post covers the other possibility to define an html entity, by name: À or ♣.

Before going further, let’s notice that normally, entities defined by their name are case-sensitive. It means that À resulting in “À” is different to à resulting in “à”. Our code will also support mistake in casing, if and only if this mistake will not result in an ambiguity. Based on that À won’t be allowed but ♣ will be.

The code is relatively straightforward, create a list of of all the known html entities and the corresponding values.

   list =
       [Name="Agrave", Number=192]
       , [Name="Aacute", Number=193]
       , [Name="Acirc", Number=194]
       , [Name="Atilde", Number=195]

Then parse the entity’s name and look for the corresponding record in the list. The search must be executed with the specified casing, if nothing is found use a case-insensitive approach. If there is no match or more than one then raise an error.

        token = if Text.StartsWith(entity, "&")
                   and Text.EndsWith(entity, ";")
                then substring
                else entity,
        substring = Text.Range(entity, 1, Text.Length(entity)-2),
        correctCasing = List.Select(
                 , each [Name]=token),
        anyCasing = List.Select(
                 , each Text.Upper([Name])=Text.Upper(token)),
        single =  try List.Single(correctCasing)[Number]
                  otherwise List.Single(anyCasing)[Number],
        handleError = Error.Record(
                       "Invalid parameter"
                       , "Unable to translate the html entity &" & token & ";"
                       , if List.Count(anyCasing)=0 then
                            "No match with the list of known entities."
                            "Multiple matches with the list of known entities. Check your the casing of your entity's name."),
        char = try Character.FromNumber(single)
               otherwise handleError

This function is available as a gist at

Passing a function to another function to improve maintainability in Power Query/M Language

Developing is relatively easy but maintaining is harder. A code relatively easy to maintain must have a good balance between expressiveness and compactness. In Power Query/M Language, passing a function to another function, as a parameter, is helping to keep your code compact and to reuse large sets of code where the difference could be tiny. Have you already coded twice the same function one using a Text.Upper and another Text.Lower? Then this blog post could interest you. Surely, abusing this feature could result in an obfuscated code. Use it cautiously and do not abuse this feature.

To illustrate this, I’ll use the same use-case than in this blog post where I was trying to parse the results of my daughters at their gymnastics championship. One of the (many) difficulties in this parsing was to determine the firstName, lastname and team. These three informations were on the same line, separated by a space. In the most common pattern it looks such as:

CHARLIER Clémence A.R. GymBraine
CHARLIER Alice A.R. GymBraine

The pattern could have been

Lastname FirstName Team

But you’ll quickly find some exceptions and difficulties. First notable pitfall, you can’t determine how many words will compose the team’s name. It could be two (A.R. GymBraine) or three (ABC Gym Dinant) or four (Royale Garde Saint Jean) with optionally hyphens, quotes, dots and parentheses. Nothing reliable, you should take the hypothesis that everything after the first name is the team’s name.

Some people have a composed last name such as VAN DER PERRE. We can’t really rely on the hypothesis that the last name is just the first word. The pattern is more that the last name is fully capitalized.

For the first names, we’ve the same kind of issue, the first names Anne-laure and Lilly-shiva are composed but easily identifiable due to the no space between them. Unfortunately some first names are not inline with the previous assumption. I found a Marie antoinette and another Adina - ioana. Hopefully, you’ll notice than in both case only the first letter is an uppercase, all remaining letters are lowercase and the team’s name is always starting by an uppercase!

From that point the pattern is the following:

LAST NAME First name Team's name

The delimiter condition for the last name is that the next two characters are an uppercase followed by a lowercase (Fi in the case above). For the first name, the delimiter condition is a space followed by a uppercase (space + T in the example above).

Cooooool … we “just” have to code that.

First step, let’s create an helper function to check if a character is uppercase or lowercase. A naïve implementation could be

isUppercase = (c as text) => c= Text.Upper(c) 

But this code will return true for a space or an hyphen. It’s not what we’ll need, we need to strictly limit ourselves to letters. In a previous blog post, I already explained how we can use Character.FromNumber and Character.ToNumber to use the ASCII table and check if a character was a figure or not. We can apply the same approach with uppercase and lowercase.

isUppercase = (c as text) => 
  c = Character.ToNumber(c) >= Character.ToNumber("A") 
      and Character.ToNumber(c) <= Character.ToNumber("Z") 

I could do the same for lowercase replacing “A” by “a” and “Z” by “z” and create a new function. Sure, but it’s probably a good place to use for the first time the pattern of passing a function to another function. I’ll rewrite the previous function as:

isCase = (c as text, case as function) => 
  c = Character.ToNumber(c) >= Character.ToNumber(case("A")) 
      and Character.ToNumber(c) <= Character.ToNumber(case("Z")) 

Then, I could create two additional helpers:

isUpperCase = (c as text) => isCase(c, (x)=>Text.Upper(x)),
isLowerCase = (c as text) => isCase(c, (x)=>Text.Lower(x)),

Did you notice, how the function Text.Upper was passed as a parameter?

Now that we can test the casing of a character, we’ll need to tackle another challenge. We need to parse two consecutives characters in a string and find the first occurrence matching the corresponding delimiter condition.

Let’s start by building a list of the two consecutive characters. To do that we’ll use List.Generateand the not well-known fourth parameter the “selection”. This parameter let’s you define a function that will generate the member of the list in place of using the iterator.

   PositionOf = (message as text) as text => 
       list = List.Generate(
            each _ < Text.Length(message)-1,
            each _+1, 
            each [
                i = _, 
                x = Text.Range(message, _, 1), 
                y = Text.Range(message, _+1, 1)]

This function is running from the first character of the string to the antepenultimate character and compose a record with respectively the position of the pair of characters in the string, the first character of the pair and the second one.

Then, we’ll need to select the first record matching a condition to get the position. The condition will change depending if we’re looking for the first name or the last name … so let it be a function!

       first = List.First(
                 List.Select(list, each condition(_))
        word = Text.Start(message, first)

The conditions for the first name and last name will be expressed as:

conditionLastName = (r as record) => 
    isUpperCase(r[x]) and isLowerCase(r[y]),
conditionFirstName = (r as record) => 
    r[x]=" " and isUpperCase(r[y]),

To test the previous code, we can implement the following test routine:

    list = {
              "ALPHA Beta L'équipe", 
              "ALPHA GOGO Beta Team's name", 
              "ALPHA GOGO Beta - gamma My 'team'"},
    temp = List.Transform(
              each [ 
                 full = _,
                 lastName = 
                    Text.Trim(PositionOf(_, conditionLastName)),
                 firstName = 
                    PositionOf(Text.Range(_, Text.Length(lastName)+1),
                 team = Text.Range(_, 
                           Text.Length(firstName) +
                           Text.Length(lastName) + 1)
    result = Table.ExpandRecordColumn(

And we’ll get the following result:


Writing a first data connector for Power BI

Since mid-2017, it’ possible to write your own custom data connectors. This feature offers the opportunity to encapsulate more difficult or technical code into a data connector and distribute it to business analysts. The key benefit is to let them concentrate on what they are supposed to do: business analysis … and not spend hours to put in place some plumbing code to get the data from your data source.

Currently, one of the supported use-case data connectors to brand a supported data source (ODBC) or enabling DirectQuery for an ODBC given data source. We won’t focus on that use-case in the next articles but in place focus on the idea to create a user-friendly experience over a REST API.

Before going further, you must download the Power Query SDK. This package is available as an extension of Visual Studio. It brings several benefits such as a powerful editor for the Power Query/M language with autocompletion and intellisense and limited features to debug your code (but at this moment nothing to monitor performances or no breakpoint). This SDK could be frustrating in some conditions but it’s still a version 1 and when something sounds clumsy in Visual Studio, just try it in Power BI to confirm the correct behavior of your code. There is room for improvement but already useful.

In this first article, I’ll create a basic data connector that will just return a list of numbers from 1 to 50. To achieve that, I’ll create a new project in Visual Studio, in the templates from the tab “Power Query”, I’ll select the “Data Connector Project” and click on “OK”.


Visual Studio will create a boilerplate project with a file having an extension .pq another query.pq but also a list of image files and a resources file. The file with the extension “.pq” will contain your code for the connector (at this moment it’s not possible to split your code in several files). The file “.query.pq” will contains the query that you want to run in debug mode (it could reference the functions defined in your “.pq” files).


If you open the “.pq” file, you’ll find an “hello world” function named PQExtension1.Contents.


You can replace this function by the code to create a serie from 1 to 50:

source = {1..50}

A few important remarks about this function. If you’re used to develop custom functions with the Power Query/M language, you’ll notice that I explicitly set the empty parenthesis to define that this function was not expecting parameters. Something that you could usually skip when writing when writing functions in Power BI. In fact, these empty parentheses imply a lot of consequences but I won’t deep dive into the Power Query/M language at this moment and just remember that functions that you expose through a Data Connector require these empty parenthesis (or with some parameters if you need them).

Second key remark, is the semi-column (“;”) at the end of my function. This is the case for all functions that you’ll develop in the “.pq” files.

If you click on the start button or hit F5, Visual Studio will start your Data connector in debug mode. This first start will just give you the opportunity to specify some credentials. Credentials are managed by DataSource.Kind. Just set the Credential Type to anonymous for this sample and click on “Set Credential”.


This screen should only display on your first debugging session, if you don’t change your DataSource.Kind. When Credentials have been specified, click again on the start button, to see the result of your data connector within Visual Studio:


Now that the connector is working within Visual Studio you can deploy it to Power BI. To achieve that task, you just need to copy the result of the build executed by Visual Studio and paste it in the folder %userprofile%\documents\Microsoft Power BI Desktop\Custom Connectors\.
This bat file is just executing this action and starting “Power BI”:

CD /d %~dp0
SET destination="%userprofile%\documents\Microsoft Power BI Desktop\Custom Connectors\"
XCOPY "PQExtension1\bin\debug\*.mez" %destination% /Y /F
"C:\Program Files\Microsoft Power BI Desktop\bin\PBIDesktop.exe"

Once Power BI has started, click on “Get Data” and select “Other”, you’ll find your data connector in this list. Select it and click on OK.


It will open the “Power Query editor” and display the result. From this point you can continue to work on your data as usual and add steps if needed.


You can also open the “Advanced editor” and check the code that has been generated when you selected the data connector.


The code is just calling the function that you created in your connector, hiding all the complex code to the business analyst. Great!

In the next articles of this serie, I’ll go further and get data from a REST API and propose a great user interface to the business analyst.

Converting datetime from UTC to Local time with Power Query M language

This article explains how to convert from UTC to Local time with the Power Query M Language. It sounds easy but it’s not so trivial.

Power Query M Language is supporting a datetimezone type and expose a function DateTimeZone.ToLocal. These artefacts are doing the job and you can quickly create the following function, converting a UTC datetime to a local datetime:

  UtcToLocal = (utc as datetime) =>
      utcToLocal = 

I’m from Belgium (UTC+1 during winter time and UTC+2 during summer time) and the following conversions are just fine:

UtcToLocal(#datetime(2018, 3, 21, 17, 0, 0)) = 
  #datetime(2018, 3, 21, 18, 0, 0)   #Winter time => +1
UtcToLocal(#datetime(2018, 3, 30, 17, 0, 0)) = 
  #datetime(2018, 3, 21, 19, 0, 0)   #Summer time => +2

When I try to convert the datetime around the switch from summer time to winter time, I’ve twice the local time 02:00:00, once at 00:00:00UTC and once at 01:00:00UTC. That’s indeed the case.

UtcToLocal(#datetime(2017, 10, 29, 0, 0, 0)) = 
  #datetime(2017, 10, 29, 2, 0, 0)   #Summer time => +2
UtcToLocal(#datetime(2017, 10, 29, 1, 0, 0)) = 
  #datetime(2017, 10, 29, 2, 0, 0)   #Winter time => +1

Everything is fine … except if I share my code with someone from another time zone. The function DateTimeZone.ToLocal is relying on regional settings and in that case my conversion should always be from UTC to “Brussels time”.

I didn’t find any other way to ensure that I’m always converting from UTC to “Brussels time” than implementing the conversion by myself. That’s the goal of the following function:

UtcToLocalTime = (date as date, time as time) =>

In Europe, the two pivotal daylight savings dates are the last Sundays of March and October. Translated in M language, it means the first day of the week, assuming the week is starting a Sunday, containing the last day of this month (which is a 31).

lastSundayOfOctober = 
  Date.StartOfWeek(#date(Date.Year(date), 10, 31), Day.Sunday),
lastSundayOfMarch = 
  Date.StartOfWeek(#date(Date.Year(date), 3, 31), Day.Sunday),

Based on the two pivotal dates, I can check if I’m in winter time or summer time

isSummerTime = 
   (date > lastSundayOfMarch and date < lastSundayOfOctober) 
   or (date = lastSundayOfOctober and time  #time(1,0,0)),

And based on that adjust the time zone

timeZone = 1 + Number.From(isSummerTime),
localTime = 
  + #duration(0,Time.Hour(time),Time.Minute(time),Time.Second(time))  
  + #duration(0, timeZone, 0, 0),

I can also check if the specified date and time corresponds to the doubled-hour of the last Sunday of October and return a symbol to that date (in a text format)

isWinterDouble = 
   (date = lastSundayOfOctober) and time = #time(1,0,0),
localString = 
   DateTime.ToText(localTime, "yyyy-MM-dd HH:mm:ss") 
   & Text.Repeat("*", Number.From(isWinterDouble)),

The final result is a record with the local date, time and textual representation:

    record = 
            LocalDate = Date.From(localTime), 
            LocalTime = Time.From(localTime), 
            LocalDateTime = localTime, 
            LocalString = localString

This function is returning the expected result:


Improving the performance when merging two tables with the Power Query M language

In a previous post, I explained how to parse a semi-structured flat file with the help of a range-join. When you just have a few thousand lines in your semi-structured flat files, this is working really well but when your files is going bigger and bigger, this strategy starts to collapse and is really slow. Hopefully, Chris Webb explained in recent blog post how adding a primary key to a table could help to improve the performances of a merge. I wanted to test that on my set this specific case.

I started to work on my previously created table categories that was in this state.


I could have created a primary key on this table with any column but I’ve serious doubt that it would result in any improvement of performances. Main reason is the usage of a custom range-join and not a merge provided by the M language. To work-around that I decided to change the structure of my table and use a classical merge function. In place of having one row for each category with the bounds of this range as fields, I want to have one row for each value between the bounds. My table will be larger but I’ll be able to use a traditional merge. The final result is the following:


To achieve that result, I created a function to generate a list of numbers contained between the lower and the upper bound given by the two fields Start Range and End Range. I added this list as a new column of my existing table.

#"RowList" = (start, end) => 
    List.Generate(() => start, each _<=end, each _+1),
#"List" = 
       #"Parsed Categories", 
       each RowList(_[Start Range], _[End Range])

then, I expanded the list as new rows

#"Expanded List" = Table.ExpandListColumn(List, "List"),

And to be sure that a primary key was added to my table, I added a step to remove the duplicates based on my new "list" column.

#"Removed Duplicates" = Table.Distinct(#"Expanded List", {"List"}),

Then on my main table participants I replaced thejoin-range by a native merge (Merge queries in the UI or NestedJoin in the code) using this new column.

#"Merged Queries" = 
         #"Removed Duplicates",

And you know what? I'm happy … performance has improved from a laborious "more than 3 minutes" to a wonderful "less than 15 seconds".


Parsing semi-structured text files with the Power Query M language

My daughters participated to the provincial championships of gymnastics during this winter, both qualified for the regional championships. Between the two events, I analyzed the results of the different provinces with Power BI to guess what they could expect at the regional level.

The first unexpected issue was the file results: PDF files. I tried several tips such as open them in Word or with other processors and try to convert them to text files but finally the best and easiest solution was to directly copy/paste them in text files before loading them in Power BI.

Once loaded in Power BI, I had a huge table with the following pattern

##Category 1
Participant 1.1
Participant 1.2
Participant 1.3
##Category 2
Participant 2.1
Participant 2.2
Participant 2.3
Participant 2.4
Participant 2.5

I needed to transform that semi-structured file to the following structured table with two columns:

Participant 1.1     Category 1
Participant 1.2     Category 1
Participant 1.3     Category 1
Participant 2.1     Category 2
Participant 2.2     Category 2
Participant 2.3     Category 2
Participant 2.4     Category 2
Participant 2.5     Category 2

To achieve that transformation, I added an index to my table of flat results. Then creating a reference query, I isolated the categories by filtering out all the rows not starting by ##. The result was similar to that table:


My goal was to determine a range of lines corresponding to each category. Based on the line index of a participant, I must be able to determine her category. The index of categories will tell me the first and last line corresponding to a given category. I already had the starting line index of the categories but not the corresponding last line index . This value is simply the value of the start index of the next category minus one. The following code is adding this information and managing the last category with a try ... otherwise ... using the last line of my result file.

#"Filtered Rows" = 
      each Text.StartsWith([Column1], "##")),
#"Renamed Columns" = 
   Table.RenameColumns(#"Filtered Rows",{{"Index", "Start range"}}),
#"Added Index" = 
    Table.AddIndexColumn(#"Renamed Columns", "Local Index", 0, 1),
#"Added Custom" = 
    Table.AddColumn(#"Added Index", "End Range", 
           try #"Renamed Columns"{[Local Index]+1}[Start range]-1 
           otherwise Table.Max(Source, "Index")[Index]),
#"Removed Columns" = 
    Table.RemoveColumns(#"Added Custom",{"Local Index"})

Now, on another function, I’ll filter out all the categories and hold the participants and the line index. Based on this index, I’ll do a range loookup on the other table, containing the categories, and add the returned information as a new column to my table of participants (range join).

The first step is to create a function to handle a range join.

    JoinRange = (value as any, array as table) as any =>
        matchingRows = 
                 each value>=[Start Range] and value<=[End Range]
        returnValue = matchingRows{0}[Text]

Then I needed to make usage of this function in a new table "participants":

    Source = flat,
    #"Filtered Rows" = 
          each not Text.StartsWith([Column1], "##")
    LookupCategory = 
           #"Filtered Rows", 
           each JoinRange(_[Index], categories)
    #"Removed Columns" = 
    #"Removed Columns"

and I finally get the expected result:


Conditionally expanding table’s columns with Power Query M language

In one of my projects, I have to deal with XML files. The structure of the XML is usually flat and I don’t need to expand columns to get access to the values I’m looking for. Loading the following file

<?xml version="1.0" encoding="UTF-8"?>

with this M code:

    Source =
    #"Expanded Table" =
           Source, "Table", {"col1", "col2"},
           {"Table.col1", "Table.col2"}
    #"Removed Columns" =
        Table.RemoveColumns(#"Expanded Table",{"Name"})
    #"Removed Columns"

will result in the following table


But from time to time, I had some errors when loading some of the xml files. After investigations, I found that in a rare few cases, I could get a null value in some columns. The example here under shows a null value for col1 at the second row.

<?xml version="1.0" encoding="UTF-8"?>
    <col1 nil="true"/>

resulting in this kind of table:


If my next step is a conversion to a numeric value for col1, it’d result in an error:


I wanted to avoid this kind of issue and expand the column to get the Element.Text when some of the values where null and just use the column when all the values were filled.

To manage this I created a new function expecting a table and a list of columns to conditionally expand as parameters.

Custom.ExpandTable = (table as table, columnNames as list) =>

The core ideas of this function is to get the schema, check the type of the columns and if this type is a Table.Typeor an Any.Type then expand this column.

To get the schema of a M table, I can rely on the function Table.Schema(...). It returns a table with a row for each column. Extracting the row that I’m looking for based on the column’s name is not so difficult:

columnType =
  (columnName as text) as text =>
      Table.Schema(table), each [Name] = columnName

Once I’ve the type of my column, I still need to check if this type correspond to a table or a any field and in that case expand the column:

expandColumn =
  (table as table, columnName as text) as table =>
    if columnType(columnName) = "Table.Type"
      or columnType(columnName) = "Any.Type"
        table, columnName, {"Element:Text"}, {columnName}
      ) else table,

Finally, I need to apply previous functions for each column’s name that I’ve in my list. List.Accumulate is your friend for this operation:

expandedTable =
    (state, current) => expandColumn(state, current)

This function is used in my main M code to conditionally expand the column based on the fact that this column is loaded a simple type or not and is avoiding errors during the next steps.

The full code is available here under:

    Source = Xml.Tables(File.Contents("C:\Users\cedri\Projects\file.xml")),
    #"Expanded Table" = Table.ExpandTableColumn(Source, "Table", {"col1", "col2"}, {"Table.col1", "Table.col2"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Table",{"Name"}),
    #"Expanded Table.col1" = ExpandTable(#"Removed Columns", {"Table.col1"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table.col1",{{"Table.col1", type number}})
    #"Changed Type"

let ExpandTable = (table as table, columnNames as list) =>
        columnType = (columnName as text) as text => Table.SingleRow(Table.SelectRows(Table.Schema(table), each [Name] = columnName))[TypeName],
        expandColumn = (table as table, columnName as text) as table => if columnType(columnName) = "Table.Type" or columnType(columnName) = "Any.Type" then Table.ExpandTableColumn(table, columnName, {"Element:Text"}, {columnName}) else table,
        expandedTable = List.Accumulate(columnNames, table, (state, current) => expandColumn(state, current))