Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Enhance your career with this limited time 50% discount on Fabric and Power BI exams. Ends September 15. Request your voucher.

tharunkumarRTK

Do Not Use Web.BrowserContents (2/2)

In my previous blog I explained bout Web.BrowserContents, in this blog I will exaplin why we should avoid using it 

 

What if the data is not on the web page? 

Let’s say you’re interested in the ‘Index of Consumer Sentiment’ data from the University of Michigan’s portal:

data.sca.isr.umich.edu/get-table.php

 

tharunkumarRTK_0-1754835719669.png

You could manually download the file and import it into Power BI Desktop — but that’s tedious. Every data update would require you to download the latest file and refresh your dataset manually. 

A better approach is to find the direct file URL from the site, usually by inspecting the anchor (<a>) tag hyperlink. Once you have this URL, connecting directly works—until you realize there is a unique file’s ID, month and year with in the URL as query parameter.  

 

https://data.sca.isr.umich.edu/get-table.php?
c=YB&y=2025&
m=6&n=1a&
f=pdf&
k=a01fbcbe6e713ac1a6d67875f5e19fc2

 

It indicates every month the URL parameter value changes. So, you’re back to square one. 

Time to roll up the sleeves and brainstorm a solution

 

Using Web.Contents and Text.FromBinary 

When you use the graphical interface and supply the web page URL, the first navigation window will show you the table it has identified but it will not show the underlying anchor tag URLs. 

 

tharunkumarRTK_1-1754835872078.png

M expression will look similar to this: 

Web.BrowserContents("https://data.sca.isr.umich.edu/tables.php") 

If you go this route, refreshing the dataset from the Power BI Service fails due to two main reasons: 

Dynamic Data Source Limitation 
Web.BrowserContents does not support data refresh from power bi service  

 

Solution 
1. Replace Web.BrowserContents with Web.Contents, and split your web URL into a root URL and a query parameter. 
Web.Contents("https://data.sca.isr.umich.edu", [RelativePath="tables.php#"]) 

tharunkumarRTK_0-1754836194538.png

 

This modification avoids issues with dynamic data sources and browser engine requirements. 

Extract the HTML using Web.Contents(), As we all know the output of this function is binary value, convert this binary to text using Text.FromBinary.  

tharunkumarRTK_1-1754836238525.png

As you can see we are able to extract the underlying HTML, the next task to identify the file url within this code. Here’s where things get tricky. As far as I know, there’s no M function that reads anchor elements directly. 

But we have workarounds! Either we can use text manipulation function like Text.BetweenDelimiter OR we can use Html.Table function. Second option would require HTML knowledge to identify the class and identifier names of different elements in the code. Since we only need one link and to keep things simple, lets solve this using text manipulation functions 

We can use functions like Text.PositionOf to locate the exact file URL. For example, in my scenario, the relative file URL is found between two position markers. As a result, you can dynamically extract the file URL. 

 

 

Text.BetweenDelimiters(
Text.BetweenDelimiters(
Text.FromBinary(
Web.Contents("https://data.sca.isr.umich.edu/tables.php#")),
"The Index of Consumer Sentiment", " </div>"),
" href=", ">")

 

 

 

Comments

This is useful for future reference. Thanks for the writeup!

This is helpful