March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
I need to pull the latest excel data from https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/si...
Clicking the topmost button (red arrowed) gives me this link: https://www.ons.gov.uk/file?uri=%2femploymentandlabourmarket%2fpeopleinwork%2femploymentandemployeet...
But I cannot just use the above url because I need the latest data, so if 2021 data comes out for example, I will need to pull the 2021 data and not the 2020final1.xlsx data (as hardcoded into the above url)
I know that every dataset url starts with "/file?uri=%" and ends with ".xls" or ".xlsx", so I did the following:
1. Input the website url and obtain the html code
2. Text.Split the html code by "/file?uri=%"
3. Use Text.BeforeDelimiter to remove text after .xls
4. Add back the /file?=uri=% and .xls to "restore" the url
5. Select the first url (which will have the latest data)
Advanced editor code as below:
let
Source = Web.BrowserContents("https://www.ons.gov.uk/employmentandlabourmarket/peopleinwork/employmentandemployeetypes/datasets/sicknessabsenceinthelabourmarket"),
#"Split by /file?uri=%" = Text.Split(Source, "/file?uri=%"),
#"Converted to Table" = Table.FromList(#"Split by /file?uri=%", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Discard text after .xls" = Table.TransformColumns(#"Converted to Table", {{"Column1", each Text.BeforeDelimiter(_, ".xls"), type text}}),
#"Removed Top Row" = Table.Skip(#"Discard text after .xls",1),
#"Added back www.ons.gov.uk[...]" = Table.ReplaceValue(#"Removed Top Row","2femploymentandlabourmarket","https://www.ons.gov.uk/file?uri=%2femploymentandlabourmarket",Replacer.ReplaceText,{"Column1"}),
#"Added .xlsx" = Table.AddColumn(#"Added back www.ons.gov.uk[...]", ".xlsx", each ".xlsx"),
#"Added website_url" = Table.AddColumn(#"Added .xlsx", "website_url", each [Column1]&[#".xlsx"]),
#"Kept First Row" = Table.FirstN(#"Added website_url",1),
#"Kept website_url" = Table.SelectColumns(#"Kept First Row",{"website_url"}),
#"Convert table into text" = Table.ToList(#"Kept website_url"){0}
in
#"Convert table into text"
Is there a better way of parsing html code in Power BI?
I want to read the entire text and extract all strings that start with /file and end with .xls or .xlsx and store the results in a list. Something similar can be achieved with Python and Regex, but how would I do this in Power Query/DAX?
Note: I checked their API, and it does not have the above dataset that I need. https://api.beta.ons.gov.uk/v1/datasets
Hi @bleow ,
To get the contents in the download file link from web link, you can try:
1. Right click the download button, choose 'Inspect'
2. In the Inspect page, find the 'real' link of this dataset:
3. Copy this link and use the Web connector in power bi to connect it, you can get the file contents:
For more details, you can refer this article: Tips to download files from webpages in Power Query and Power BI
Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Thanks for the reply. Is there any way to do this url retrieval programmatically? My client wants as much automation as possible and doesn't want to manually check the website for dataset updates.
And in that regard, I do already have a script in the original post that does this, albeit in a hacked-together manner and I'm wondering if there's a better way to do it, or if say python/pandas integration is possible for my use case.
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.
User | Count |
---|---|
114 | |
76 | |
57 | |
52 | |
44 |
User | Count |
---|---|
164 | |
116 | |
63 | |
57 | |
50 |