Web scraping of unstructured webpages
This PowerBI proof-of-concept illustrates how content of webpages can be transformed to an interactive report. The sample counts the number of 'Jobs for R-users' by city. The POC specifically demonstrates web scraping of unstructured webpages and reporting within PowerBI (ie; ETL).

The website source data is from the url address: http://www.r-users.com/jobs The job board is for companies looking for R-users to hire. The jobs are entered by date. The purpose of the dashboard is to report the loaded data grouped by city instead of a listing by date. The webpage layout is custom and not in a list format (table row x column). There are 25 items per page (about 20 pages) on this job board for people and companies looking to hire R users.

On this site, there is no html table to be loaded for processing. The web data needed to be scraped page by page following a pattern. The web data was loaded by an R-script cycling through each page. Once data was loaded, two reports were chosen to be PowerBI tiles. The first tile was a world map indicating cities from the data. The second tile was a bar chart showing city locations and counts. The tiles are inherently interactive on the PowerBI platform. The POC sample was published in a way not requiring a PowerBI user account or license. So now I can fill-in some meaningful documentation.

PowerBI is actually easy when you have your data source(s) loaded. So I am not targeting typical dashboards. I have also been working on the second PowerBI dashboard with 'time-line' data AND ANIMATION.

text text
text text

The Pulpit Rock is a massive cliff 604 metres (1982 feet) above Lysefjorden, opposite the Kjerag plateau, in Forsand, Ryfylke, Norway. The top of the cliff is approximately 25 by 25 metres (82 by 82 feet) square and almost flat, and is a famous tourist attraction in Norway.

The Pyramid
Fig.1 - A view of the pulpit rock in Norway. http://www.r-users.com/jobs

Note: The figure tag is not supported in Internet Explorer 8 and earlier versions.


zoom
This is some textblog
This is some text
And some more text
abcdef And some more text to end
ghi