Month: December 2018

Extracting key phrases with the Text Analytics API and Power BI

On a previous series of blog posts, I explained how to grab the list of Data Platform MVPs from a website and add additional data about them as such as their GitHub contributions. My last article also explained how to use the Face API to guess the gender of these MVP. I’ll now parse their biography to extract key phrases and display the most used themes in these texts. To support this need, I’ll use the Text Analytics API provided by Azure Cognitive services.

The Text Analytics API is a suite of text analytics web services built with best-in-class Microsoft machine learning algorithms. The API can be used to analyze unstructured text for tasks such as sentiment analysis, key phrase extraction and language detection. No training data is needed to use this API; just bring your text data. This API uses advanced natural language processing techniques to deliver best in class predictions.

The first step of this exercice is to grab the biography. We previously discussed how to parse a web page and retrieve pertinent information. In that case we need to look for the html tag <pre>, extract the text and clean it up.

let
    GetBiography = (url as text) as text=>
let
    source = Web.Contents(url),
    lines = Lines.FromBinary(source),
    biography = List.Select(lines, each Text.Contains(_, "<pre>")){0},
    cleanedBiography = ConvertHtmlSpecialChars(Text.BeforeDelimiter(Text.AfterDelimiter(Text.Clean(Text.Trim(biography)),">"), "<"))
in
    try cleanedBiography otherwise ""
in
    GetBiography

We need to setup a Text Analytics API on our Azure account. If you don’t know how to do this refer to the previous article of this series. About pricing, selecting the free (FO) possibility is probably the best approach. We’ll have a very limited usage of this API with just one call by run.

text-analytics-api-pricing.png

Indeed, at the difference of the Face API, we don’t need to perform a call for each MVP to the Text Analytics API. We only need to call this API once with all the biographies. The API is expecting an list of records (with the language, an id and the biography). The language will always be “en” (standing for English) and for the id, we’ll use the MvpId. Preparing this structure is performed in a function PrepareKeyPhrases.

let
    PrepareKeyPhrases = () =>
let
    source = Table.SelectColumns(#"All info",{"MvpId", "Biography"}),
    filtered = Table.SelectRows(source, each [Biography] <> ""),
    addEnglish = Table.AddColumn(filtered, "language", each "en"),
    renamed = Table.RenameColumns(addEnglish,{{"Biography", "text"}, {"MvpId", "id"}}),
    reordered = Table.ReorderColumns(renamed,{"language", "id", "text"}),
    list = Table.ToRecords(reordered)
in
    list
in
    PrepareKeyPhrases

Once we have the expected input for the query, we just need an additional function to call the Text Analytics API and the method keyPhrases. This power query-m function will be named GetKeyPhrases, receive in parameter the result of PrepareKeyPhrases and will use the same approach than for the Face API, detailed in this article. The most complex part of this query is to transform the structure (list of records) that we previously prepared to a Json text. We can achieve this by transforming the structure to a Json document with Json.FromValue and then we need to serialize this by transforming our newly created binary to text with the help of Text.FromBinary.

let
    GetKeyPhrases = (list as list) =>

let
    headers = [#"Content-Type" = "application/json", #"Ocp-Apim-Subscription-Key" = TextApiKey],
    jsonText = Text.FromBinary(Json.FromValue(list)),
    content = "{ ""documents"": " & jsonText & "}",
    source = Web.Contents(TextApiUrl, [Headers = headers, Content=Text.ToBinary(content)])
in
    source
in
    GetKeyPhrases

We’ll need to provide a table for these keyPhrases. This new table will be linked to the main table with all the MVP details in the model.

let
    source = GetKeyPhrases(PrepareKeyPhrases()),
    json = Json.Document(source,65001)[documents],
    toTable = Table.FromList(json, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    expandColumns = Table.ExpandRecordColumn(toTable, "Column1", {"id", "keyPhrases"}, {"MvpId", "Word"}),
    expandList = Table.ExpandListColumn(expandColumns, "Word"),
    nonNullRows = Table.SelectRows(expandList, each [Word] <> null),
    capitalized = Table.TransformColumns(nonNullRows,{{"Word", Text.Proper, type text}}),
    cleaned = Table.TransformColumns(capitalized,{{"Word", Text.Clean, type text}}),
    trimmed = Table.TransformColumns(cleaned,{{"Word", Text.Trim, type text}})
in
    trimmed

Now that we’ve this list of keywords, we can freely use the Word Cloud custom visual.

text-analytics-word-cloud.

It’s interesting to compare some of the keywords extracted for the recently awarded MVPs.

text-analytics-api-new-mvp

and the long time MVPs.

text-analytics-api-long-time-mvp

At a first sight, book sounds as a keyword associated to long-time MVP and Power BI to the recently awarded.

Advertisements

Using the Face API with Power BI

On a previous series of blog posts, I explained how to grab the list of Data Platform MVPs from a website and add additional data about them as such as their GitHub contributions. In this sequel, I’ll also add the gender of the MVPs … based on their profile picture. To achieve this I’ll rely of the Face API, part of the Azure Cognitive Services.

Let’s start by creating a service on your Azure subscription. If you have no Azure subscription don’t panic, you can create a new one and you’ll receive some free credits to try a few services including the Face API. Also, if you want, you can use the Face API for free with the level of service F0 limiting the calls by minute to twenty.

To create a Face API service in your subscription, click on “add resources” and search for “face”, select the Face API:
face-api

Then select the Face service in the list:
face-api

After this selection, give a name to your service, select the resources group and make a choice about the level of service that you’re expecting:

face-api-pricing

Now, we’ll go back to Power BI and we’ll create a new query, To be exact a new function in Power Query. This function will call the Face API passing in parameter the url of the picture. the return will be a text with the possible values: male, female or unknown. I can already create two parameters related to the Face API:

  • The base url dependent of where your service is hosted. I named this parameter FaceApiUrl. This information is located in the Azure portal, in the overview of your Face API service. face-api-url
  • The key of your service. I named this parameter FaceApiKey and the value is also located in the Azure portal but in the keys section.face-api-key

Based on the documentation of the Face API, I’ll have to send a POST request to the service. The request must be sent to a base url and specifying what are the expected parameters computed by the API. In this specific case I’m only interested by the gender. I can build a record for query parameters:

query = [#"returnFaceAttributes"= "gender"]

As previously explain by Chris Webb (blog post) or Erik Svensen (blog post) to submit a post request you’ll need two tricks.

  1. You must specify a content. This action will switch the execution context of Web.Contents from a GET request to a POST request
  2. The text of the content must be transformed to a binary

To create the content, I’ll have to refer to the documentation of the Face API and I need to create a Json document with just one attribute the url of the picture.

content = "{ ""url"": """ & url & """}"

To submit a valide request, I’ll also have to submit the correct key (credential) to the Face API and specify that my content-type is an application/json. These two information must be specified in the headers of my request. The field headers is also expecting a record so I’ll submit the following construction.

headers =
   [#"Content-Type" = "application/json",
   #"Ocp-Apim-Subscription-Key" = FaceApiKey]

Now that the different components of my request are built, i just need to invoke the request with the help of the function Web.Contents().

response = Web.Contents(
   FaceApiUrl,
   [
      Query = query,
      Headers = headers,
      Content=Text.ToBinary(content)
   ]
)

Once we get the response, we just to parse it to extract the gender.

gender = Json.Document(source,65001){0}[faceAttributes][gender]

The complete method should handle some unexpected errors such as picture missing or the quality of the picture is not high enough and it’s not possible to detect the gender.

let
   GetGender = (url as text) =>
let
   query = [#"returnFaceAttributes"= "gender"],
   headers = [#"Content-Type" = "application/json", #"Ocp-Apim-Subscription-Key" = FaceApiKey],
   content = "{ ""url"": """ & url & """}" ,
   source = Web.Contents(FaceApiUrl, [Query = query, Headers = headers,    Content=Text.ToBinary(content)]),
   gender = Json.Document(source,65001){0}[faceAttributes][gender]
in
   try gender otherwise "unknown"
in
   GetGender

At the end, I’m able to create this visual and see that the percentage of women in the Data Platform MVP is still really low.

face-api-gender