Projections with Power BI (M language)

In a previous blog post, I explained how to list all the Data Platform MVP by parsing a single web page. In a second blog post, I parsed the individual pages of each MVP to get information about her awards. It was time to get information about social media for each MVP and connect to the GitHub API to retrieve more info about their activities on GitHub. This third blog post was ending on a remark about the limited interest of the values available (count of public repositories, followers) and the need to retrieve more pertinent info such as the commit activity on the repositories owned by the MVPs.

From a technical perspective, I’ll show how to make a projection in Power BI (M Language). A projection is a process of selecting data in a different shape rather than a specific entity being queried. There are many ways of projection such as GroupBy or SingleOrDefault but I’ll illustrate this with the less known Transform.

Let’s try this! We first need the list of all the repositories of an MVP. The starting point is the list of social media, filtered on the website “GitHub”. Then to get a list of repositories for the login of the MVP, I’ll use the GitHub API /users/:login/repos.

let
    Source = #"Social Media",
    #"Removed Columns" =
        Table.RemoveColumns(
            Source
            ,{"Url"}
        ),
    #"Filtered to Github website" =
        Table.SelectRows(
            #"Removed Columns"
            , each ([Website] = "GitHub")
        ),
    #"Get repositories from Github" =
        Table.AddColumn(
            #"Filtered to Github website"
            , "Custom"
            , each Web.Contents(
                "https://api.github.com/users/"
                & [Login] & "/repos"
            )
        ),

This give me a nice table with all the informations about repositories in a binary format:

mvp-analysis-powerbi-007

At the opposite of previous post, the expanding of the response is a bit easier. I won’t use the pivot functions, just some expands.

    #"Expanded Transform Binary from Github" =
        Table.ExpandListColumn(
            #"Transform JSON to Table"
            , "Transform Binary from Github"
        ),
    #"Expanded Transform Binary from Github1" =
         Table.ExpandRecordColumn(
            #"Expanded Transform Binary from Github"
            , "Transform Binary from Github"
            ,{
                "name", "owner", "html_url"
                , "description", "fork"
                , "stargazers_url", "contributors_url"
                , "homepage", "stargazers_count"
                , "watchers_count", "language"
                , "forks_count", "open_issues_count"
                , "forks", "open_issues", "watchers"
            }
                ,{"name", "owner", "html_url"
                , "description", "fork", "stargazers_url"
                , "contributors_url", "homepage"
                , "stargazers_count", "watchers_count", "language"
                , "forks_count", "open_issues_count"
                , "forks", "open_issues", "watchers"
            }
        ),
    #"Expanded owner" =
        Table.ExpandRecordColumn(
            #"Expanded Transform Binary from Github1"
            , "owner"
            , {"login"}
            , {"owner.login"}
        ),

A bit of cleaning by removing and renaming some columns

    #"Removed Columns1" =
        Table.RemoveColumns(
            #"Expanded owner"
            ,{"Custom"}
        ),
    #"Renamed Columns" =
        Table.RenameColumns(
            #"Removed Columns1"
            ,{{"stargazers_count", "Count of stars"}
            , {"watchers_count", "Count of watchers"}
            , {"forks_count", "Count of forks"}
            , {"name", "Repository's name"}
            , {"open_issues_count", "Count of issues"}}
        ),

Filtering out the forks

#"Filtered Rows" = Table.SelectRows( #"Renamed Columns" , each ([fork] = false) ),

And I’ve the following result:

mvp-analysis-powerbi-008.png

That’s cool but not it’s not finished. I still haven’t got some information about commits. To achieve this, I’ll need to make an additional query to the GitHub API and request the commit activity of each repository: /users/:login/:repos/stats/commit_activity.

Let’s build a function to achieve this.

let
    GetCommitsDuringPeriod = (
        login as text
        , repository as text
        , countOfWeeks as number
    ) =>
let
    #"List of commits by week" =
        Json.Document(
            Web.Contents(
                "https://api.github.com/repos/"
                & login
                & "/"
                & repository
                & "/stats/commit_activity"
            )
        ),

This api send a response with a rather complex structure: A list of records, week by week, the commits on each weekday, the commits of the whole week (equal to the sum of the commits on each weekday – named total) and the week identification (Unix timestamp).

mvp-analysis-powerbi-009

I’m not interested by most of this information and I just need to keep the information total for the last 52 weeks. This is where I’ll apply a projection and I’ll use the function List.Transform. This function, let’s me explain that I want to transform my initial list into a new one. For each element, I’ll just hold the field total.

    #"List of totals" =
        List.Transform(
            List.LastN(
                #"List of commits by week"
                , countOfWeeks
            )
            , each _[total]
        ),

In this case, the transformation will return a list of number (corresponding to each total). Again, I’m not really interested by the detail and I can sum all these records with the help of the function List.Sum.

    #"CountOfCommits" = List.Sum(#"List of totals")
in
    #"CountOfCommits"
in
    GetCommitsDuringPeriod

Now, I can easily integrate my function to the previous table, I just need to add a column calling this function.

    #"Added commit activity" = Table.AddColumn(
            #"Filtered Rows"
            , "Count of commits this year"
            , each GetCommitsDuringPeriod(
                      [Login]
                      , [#"Repository's name"]
                      , 52
                 )
            )
in
    #"Added commit activity"

In the next blog post, I’ll add the issues’ activity for each repository. From a logical point of view, it’ll be the same process but the technical challenge will be the pagination of the results of the GitHub API.

Advertisements

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s