Wednesday 9 August 2017

Using HTML as Your Datasource

I'm not a programmer and I don't have an IT background, so when I heard of this method of connection, I was super stoked to try it out!

In a nutshell, you would structure your data connections this way:
HTML to Google Sheets to Tableau

Selecting Your Source - The Web Page

Obviously it would be best to connect to a web page that updates over time. Meaning, the same URL will hold data that updates and thus update your data source.

I am working on a workbook related to the Canadian Federal budget and how they spend our hard earn tax dollars, so I connected my Google Sheets to the Authorities and Expenditures by Program page.



This is, however, not a good example as the government posts by groupings of years. It is shown in the URL and thus I don't believe this particular page will be updated with new information but rather, the new information will be published onto a new page that has a new URL altogether. Having the year in the URL kind of gave it away.


A better web page to connect to would be, for instance, like a Wikipedia page where tables are updated without the URL changing.


The Formula & The Table

In Google Sheets, use the following syntax:


= IMPORTHTML(url, query, index)

The URL is easy, simply copy and paste.
The query is "table" or "list depending on what type of structure contains the desired data.
The index, however, may take a little bit of work.

Open up HTML viewer and look for all the "tables" until you find the table in question. For instance, if your table is the 3rd out of 4 tables, you would put "3" in the formula for index. The federal expenditure web page only had 1 table so I put "1" for my formula.



*Open HTML in Google Chrome by going to Settings à More Tools à Developer Tools
  Chrome: https://developers.google.com/web/tools/chrome-devtools/
  IE: https://msdn.microsoft.com/en-us/library/dd565627(v=vs.85).aspx#htmltool
  Firefox: https://developer.mozilla.org/en-US/docs/Tools/Page_Inspector/How_to/Examine_and_edit_HTML

In the end, my full formula looked like this:


=IMPORTHTML("https://www.canada.ca/en/treasury-board-secretariat/services/planned-government-spending/budgets-expenditures/expenditures-program-2016.html","table",1)

You'll see the formula load for a few seconds, and then the table should load.




Connect to Tableau

Open Tableau and connect to Google Sheets. A pop up window will prompt you to sign into your google account, then it should connect to Google Drive automatically. You can also copy and paste the URL of your Google Sheet directly in the search bar.


Canadian Federal Fiscal Budget by Type

No comments:

Post a Comment