Adding a table from Wikipedia to enrich your dataset with a feature of Power Query empowered by AI

In previous blog posts, I explained how to parse the time series related to COVID-19 published by the John Hopkins University. Wonderful, but we still have to enrich our dataset with the population by country. This information is easily accessible in Wikipedia for a human: just browse the page List of countries and dependencies by population hosted by Wikipedia and scroll down until you reach the table.

For a computer, it’s another challenge! I usually write Power Query/M code from scratch and I rarely use the UI … but in this case I will. Why? Because the tool that we’ll use is empowered by AI and I must confess that AI is much more powerful for this than my own brain. This is really straightforward and it can be executed in a few steps.

Create a new table by selecting the data source “Web” and paste the url of the page that you want to scrap.

The new screen will already parse your page and extract all the potential tables. Select the first one but click on “using examples”

Another screen appears and ask you to provide examples. As soon as you’ll be starting to write the first letter of China, a pop-up will appear and will provide suggestion. select the one matching with your expectations.

One example is rarely enough, you’ll probably have to provide a second (and potentially a third) example. Then the Ai engine will understand your examples and provide the results for the whole column. You can do the exact same thing for the second column.

Then don’t forget to rename your columns with the expected names and click on OK. The table will be created with the expected values for the countries and population.

After this, you’ll surely have to sprinkle a bit of black magic to merge this information with other informations from John Hopkins University but it’s just business as usual. If you want to know more about this, just open the code on my GitHub account for the COVID-19 analysis.

One comment

Leave a comment