Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Join us for an expert-led overview of the tools and concepts you'll need to become a Certified Power BI Data Analyst and pass exam PL-300. Register now.

Reply
bleow
Frequent Visitor

Retrieve most updated excel file from a website without API support

I need to pull the latest excel data from https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/si...

bleow_2-1625756917215.png

 

Clicking the topmost button (red arrowed) gives me this link: https://www.ons.gov.uk/file?uri=%2femploymentandlabourmarket%2fpeopleinwork%2femploymentandemployeet...

 

But I cannot just use the above url because I need the latest data, so if 2021 data comes out for example, I will need to pull the 2021 data and not the 2020final1.xlsx data (as hardcoded into the above url)

 

I know that every dataset url starts with "/file?uri=%" and ends with ".xls" or ".xlsx", so I did the following:

1. Input the website url and obtain the html code

2. Text.Split the html code by "/file?uri=%"

3. Use Text.BeforeDelimiter to remove text after .xls

4. Add back the /file?=uri=% and .xls to "restore" the url

5. Select the first url (which will have the latest data)

bleow_1-1625756631719.png

Advanced editor code as below:

let
    Source = Web.BrowserContents("https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/sicknessabsenceinthelabourmarket"),
    #"Split by /file?uri=%" = Text.Split(Source, "/file?uri=%"),
    #"Converted to Table" = Table.FromList(#"Split by /file?uri=%", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Discard text after .xls" = Table.TransformColumns(#"Converted to Table", {{"Column1", each Text.BeforeDelimiter(_, ".xls"), type text}}),
    #"Removed Top Row" = Table.Skip(#"Discard text after .xls",1),
    #"Added back www.ons.gov.uk[...]" = Table.ReplaceValue(#"Removed Top Row","2femploymentandlabourmarket","https://www.ons.gov.uk/file?uri=%2femploymentandlabourmarket",Replacer.ReplaceText,{"Column1"}),
    #"Added .xlsx" = Table.AddColumn(#"Added back www.ons.gov.uk[...]", ".xlsx", each ".xlsx"),
    #"Added website_url" = Table.AddColumn(#"Added .xlsx", "website_url", each [Column1]&[#".xlsx"]),
    #"Kept First Row" = Table.FirstN(#"Added website_url",1),
    #"Kept website_url" = Table.SelectColumns(#"Kept First Row",{"website_url"}),
    #"Convert table into text" = Table.ToList(#"Kept website_url"){0}
in
    #"Convert table into text"

 

Is there a better way of parsing html code in Power BI?

I want to read the entire text and extract all strings that start with /file and end with .xls or .xlsx and store the results in a list. Something similar can be achieved with Python and Regex, but how would I do this in Power Query/DAX?

 

Note: I checked their API, and it does not have the above dataset that I need. https://api.beta.ons.gov.uk/v1/datasets

 

 

 

 

2 REPLIES 2
v-yingjl
Community Support
Community Support

Hi @bleow ,

To get the contents in the download file link from web link, you can try:

1. Right click the download button, choose 'Inspect'

inspect.png

 

2. In the Inspect page, find the 'real' link of this dataset:

link.png

 

3. Copy this link and use the Web connector in power bi to connect it, you can get the file contents:

web.png

 

For more details, you can refer this article: Tips to download files from webpages in Power Query and Power BI 

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks for the reply. Is there any way to do this url retrieval programmatically? My client wants as much automation as possible and doesn't want to manually check the website for dataset updates.

 

And in that regard, I do already have a script in the original post that does this, albeit in a hacked-together manner and I'm wondering if there's a better way to do it, or if say python/pandas integration is possible for my use case.

Helpful resources

Announcements
Join our Fabric User Panel

Join our Fabric User Panel

This is your chance to engage directly with the engineering team behind Fabric and Power BI. Share your experiences and shape the future.

June 2025 Power BI Update Carousel

Power BI Monthly Update - June 2025

Check out the June 2025 Power BI update to learn about new features.

June 2025 community update carousel

Fabric Community Update - June 2025

Find out what's new and trending in the Fabric community.