Microsoft Fabric Community Conference 2025, March 31 - April 2, Las Vegas, Nevada. Use code FABINSIDER for a $400 discount.
Register nowThe Power BI DataViz World Championships are on! With four chances to enter, you could win a spot in the LIVE Grand Finale in Las Vegas. Show off your skills.
Hi all,
Like many others (here, here and here), I have been struggling with the inability of Power BI service to perform a scheduled refresh on a public web source without using a personal gateway.
According to the MS documentation, this is still not possible. (Don't ask me why)
The problem with this is that the M function Web.Page(), which is needed to work with HTML input, requires a gateway to work.
However, I have been playing around a bit lately, and believe that I have found a workaround that is relatively flexible. I'd like to share it here and get some feedback. Maybe some of you have some suggestions on how to improve the code to make it more broadly applicable.
In this example, I am trying to load the table from the following page into Power Query: https://www.w3schools.com/html/html_tables.asp.
let
URL = "https://www.w3schools.com/html/html_tables.asp",
Start = "<table class=""ws-table-all"" id=""customers"">",
End = "</table>",
Source = Table.FromColumns({Lines.FromBinary(Web.Contents(URL))}),
ToList = Table.Column(Source, "Column1"),
RowCount = Table.RowCount(Source),
TableStart = List.PositionOf(ToList, Start),
TableEnd = List.PositionOf(ToList, End),
RemoveTopRows = Table.Skip(Source,TableStart),
RemoveBottomRows = Table.RemoveLastN(RemoveTopRows,RowCount-TableEnd-1),
List = Table.Column(RemoveBottomRows, "Column1"),
ChangeStart = List.ReplaceValue(List, Start, "<data>", Replacer.ReplaceText),
ChangeEnd = List.ReplaceValue(ChangeStart, End, "</data>", Replacer.ReplaceText),
XML = Xml.Tables(Lines.ToText(ChangeEnd)),
Table = XML{0}[Table],
#"Added Custom1" = Table.AddColumn(Table, "data", each if [th] <> null then [th] else [td]),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"th", "td"}),
#"Added Custom" = Table.AddColumn(#"Removed Columns", "Custom", each Table.Transpose([data])),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"data"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns1", "Custom", {"Column1", "Column2", "Column3"}, {"Column1", "Column2", "Column3"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Expanded Custom", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Company", type text}, {"Contact", type text}, {"Country", type text}})
in
#"Changed Type"
The key here is to circumvent the use of the Web.Page() function, so here are the steps:
I have tested this strategy with Power BI service, incremental refresh and scheduled refresh and it works like a charm.
If you have any suggestions on how to improve this code, please let me know.
Cheers,
Kirvis
Hi Kirvis,
I'm trying to apply your method to an HTML table I have. I'm getting an error:
Expression.Error: We expected a CountOrCondition value.
Details:
[Table]
This is apparently related to The RemoveTopRows step, am I missing something? Is there something here that needs to change?
The only things I've changed are URL and Start.
URL is pretty straight forward but here's the Start change
Start = "<table class=""table table-bordered table-condensed table-hover table-fixed pre-date-details-list"">",
Hi @kirvis ,
that looks like a very nice solution. Unfortunately I don't have the time currently to work on it.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hi Kirvis,
This is a great workaround and works very well. Well done!
What if the URL has multiple tables in and is updated say weekly with new tables added?
Curious to see how would that work out as there are many cases going around these days.
Thanks.
For using online data and on premises data like Azure sql server, Share point online and SSAS etc,.
For scheduling refresh web page, it's a good work around.
Please check the below link for the same.
I hope this may help you.
Regards,
Manikumar
Proud to be a Super User!
Hi @manikumar34 ,
Thanks for your response.
I am aware of the blog posts that Chris wrote, but if I'm not mistaken, he addresses a different issue. He does not use the function Web.Page() in his script. He does not have to because the response of the URL he uses is JSON, which Power Query handles very well.
The workaround I mention in my post is specifically about scraping data from web pages. As you can see in the MS documentation about sources I mention in the post, you need a gateway to be able to use scheduled refresh for that.
My goal for this workaround was to explicitly circumvent the need for setting up a gateway.
Hope that explains.
Kirvis
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the February 2025 Power BI update to learn about new features.
User | Count |
---|---|
27 | |
26 | |
21 | |
12 | |
10 |
User | Count |
---|---|
27 | |
25 | |
22 | |
17 | |
13 |