Microsoft Fabric Community Conference 2025, March 31 - April 2, Las Vegas, Nevada. Use code FABINSIDER for a $400 discount.
Register nowThe Power BI DataViz World Championships are on! With four chances to enter, you could win a spot in the LIVE Grand Finale in Las Vegas. Show off your skills.
Hi,
Just seeking some help getting data from multiple files on a HTML web page via Power BI.
Any help could do.
Cheers
Going to need some additional detail to help, are you referring to files linked from a web page? What kinds of files?
HI @Greg_Deckler,
Thank you for attempting to help me out.
The files are Tables in xls format stored on a web pages. I need to get those files straight into power BI without downloading them manually. This is so data refresh can work.
Look forward to your response.
You would use the web connector and if they are true tables on the web page, they should show up in the navigation.
I have tried that, but unfortunately it doesnt work, even once I have expanded or drilled down on the columns. From my understanding there is a need to create a parameter and function, but not sure on the process of doing so.
Try this
let Source = Excel.Workbook(Web.Contents("go.microsoft.com/fwlink/?LinkID=521962"), null, true), financials_Table = Source{[Item="financials",Kind="Table"]}[Data], #"Changed Type" = Table.TransformColumnTypes(financials_Table,{{"Segment", type text}, {"Country", type text}, {"Product", type text}, {"Discount Band", type text}, {"Units Sold", type number}, {"Manufacturing Price", Int64.Type}, {"Sale Price", Int64.Type}, {"Gross Sales", type number}, {"Discounts", type number}, {" Sales", type number}, {"COGS", type number}, {"Profit", type number}, {"Date", type date}, {"Month Number", Int64.Type}, {"Month Name", type text}, {"Year", Int64.Type}}) in #"Changed Type"
I connected to the site using the web connector, then clicked right mouse menu over the document and Selected Excel.
If you have lots of documents you can use functions to loop over the documents. There are several posts that explain how.
e.g.
https://powerbi-pro.com/iterating-over-an-unknown-number-of-pages-in-power-query/
Phil
Hi Phil,
Yes I have to loop over multiple files.
How would I go about doing that?
Hi @prat91,
Here is a good article about how to iterate over multiple pages of web data using Power Query for your reference.
Regards
HI @v-ljerr-msft,
Thank you for the source.
I had a look, however, I need to loop over multipe xls files located on a single web page.
Is the process the same?
Ive have tried the mentioned steps, but still no success.
Cheers
Loading the excel worked fine, but the tricky bit is getting the list of excel links from the website you sent is not easy.
I've done it 2 ways one is treating the web page as text instead of html then stripping out the xls links.
I also put an r script version that uses a html scraping library.
Once you have the list you can use a function to process into a list.
I did this by doing a single excel weblink, then using the Create Function option from the Queries List.
The paramaterise the link by putting a name in the source brackets Source = (weblink) and using this name in place of the excel weblink.
Finally in the list you add new column using invoke function.
It's a bit slow but does work. I've filtered to the top 4 files, so remove this for all excels.
fLoadExcelRaw
let Source = (weblink) => let Source = Excel.Workbook(Web.Contents(weblink), null, true), Data2 = Source{[Name="Data1"]}[Data] in Data2 in Source
This is the R scraping version
R Code
library(rvest) page=read_html("WEBSITE") data <- as.data.frame(html_attr(html_nodes(page, "a"), "href"))
M Query
let Source = Table.FromColumns({Lines.FromBinary(Web.Contents("WEBSITE"))}), #"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Column1], "Time Series Spreadsheet") and Text.Contains([Column1], "xls")), #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows", "Column1", Splitter.SplitTextByEachDelimiter({"a href=""/"}, QuoteStyle.None, false), {"Column1.1", "Column1.2"}), #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}}), #"Split Column by Delimiter1" = Table.SplitColumn(#"Changed Type", "Column1.2", Splitter.SplitTextByEachDelimiter({""""}, QuoteStyle.None, false), {"Column1.2.1", "Column1.2.2"}), #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Column1.2.1", type text}, {"Column1.2.2", type text}}), #"Replaced Value" = Table.ReplaceValue(#"Changed Type1","WEBSITE","WEBSITE",Replacer.ReplaceText,{"Column1.2.1"}), #"Renamed Columns" = Table.RenameColumns(#"Replaced Value",{{"Column1.2.1", "ExcelLink"}}), #"Removed Columns" = Table.RemoveColumns(#"Renamed Columns",{"Column1.1", "Column1.2.2"}), #"Kept First Rows" = Table.FirstN(#"Removed Columns",4), #"Invoked Custom Function" = Table.AddColumn(#"Kept First Rows", "ExcelData", each fLoadExcelRaw([ExcelLink])), #"Expanded ExcelData" = Table.ExpandTableColumn(#"Invoked Custom Function", "ExcelData", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29", "Column30", "Column31", "Column32", "Column33", "Column34", "Column35", "Column36", "Column37", "Column38", "Column39", "Column40", "Column41", "Column42", "Column43", "Column44", "Column45", "Column46", "Column47", "Column48", "Column49", "Column50", "Column51", "Column52", "Column53", "Column54", "Column55", "Column56", "Column57", "Column58", "Column59", "Column60", "Column61", "Column62", "Column63", "Column64", "Column65", "Column66", "Column67", "Column68", "Column69", "Column70", "Column71", "Column72", "Column73", "Column74", "Column75", "Column76", "Column77", "Column78", "Column79", "Column80", "Column81", "Column82", "Column83", "Column84", "Column85", "Column86", "Column87", "Column88", "Column89", "Column90", "Column91", "Column92", "Column93", "Column94", "Column95", "Column96", "Column97", "Column98", "Column99", "Column100", "Column101", "Column102", "Column103", "Column104", "Column105", "Column106", "Column107", "Column108", "Column109", "Column110", "Column111", "Column112", "Column113", "Column114", "Column115"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29", "Column30", "Column31", "Column32", "Column33", "Column34", "Column35", "Column36", "Column37", "Column38", "Column39", "Column40", "Column41", "Column42", "Column43", "Column44", "Column45", "Column46", "Column47", "Column48", "Column49", "Column50", "Column51", "Column52", "Column53", "Column54", "Column55", "Column56", "Column57", "Column58", "Column59", "Column60", "Column61", "Column62", "Column63", "Column64", "Column65", "Column66", "Column67", "Column68", "Column69", "Column70", "Column71", "Column72", "Column73", "Column74", "Column75", "Column76", "Column77", "Column78", "Column79", "Column80", "Column81", "Column82", "Column83", "Column84", "Column85", "Column86", "Column87", "Column88", "Column89", "Column90", "Column91", "Column92", "Column93", "Column94", "Column95", "Column96", "Column97", "Column98", "Column99", "Column100", "Column101", "Column102", "Column103", "Column104", "Column105", "Column106", "Column107", "Column108", "Column109", "Column110", "Column111", "Column112", "Column113", "Column114", "Column115"}) in #"Expanded ExcelData"
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Check out the February 2025 Power BI update to learn about new features.
User | Count |
---|---|
87 | |
81 | |
53 | |
38 | |
35 |