March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early bird discount ends December 31.
Register NowBe one of the first to start using Fabric Databases. View on-demand sessions with database experts and the Microsoft product team to learn just how easy it is to get started. Watch now
Hi all,
I am trying to automate a process that involves;
1 - File/s download from given fixed URL,
2 - Unzip contents,
3 - Query contained CSV file/s.
The URL remains fixed, but files available for download are updated at regular frequency and so do not parmeterize easily using - for example - steps discussed here by Chris Webb and others. Ideally I would like to have a table of downloaded file names to avoid downloading the same file multiple times. Having a local copy of the files downloaded would be an advantage in this case.
If zip files were downloaded to a local directory, there seem to be a few different approaches to unzipping contents using either R script here from @prathy or a Power Query M language approach discussed here at Mark White's sql10 blogspot.
The last step has been completed using power query within Power BI, but currently relies on a manual process to download and unzip contents to a local directory.
I would be very interested to know anyone's experience with this.
Questions
Is the R script or the Power Query M the best approach for now given the rate at which Power BI data connectors are being developed ?
Is a seamless download , unzip and query all within Power BI too far away ?
Is the URL a folder that contains the zipped files?
If you open that URL as a web page in Power Query then drill down through the tables, you can get a list of ZIP files:
let Source = Web.Page(Web.Contents("http://www.nemweb.com.au/REPORTS/CURRENT/Daily_Reports/")), Data0 = Source{0}[Data], Children = Data0{0}[Children], Children1 = Children{1}[Children], Children2 = Children1{2}[Children], #"Removed Other Columns" = Table.SelectColumns(Children2,{"Children"}), #"Expanded Children" = Table.ExpandTableColumn(#"Removed Other Columns", "Children", {"Text"}, {"Children.Text"}), #"Filtered Rows" = Table.SelectRows(#"Expanded Children", each ([Children.Text] <> null and [Children.Text] <> "[To Parent Directory]")) in #"Filtered Rows"
You could save that as a CSV each day with the R script you mention, then join with the result above the following day to find any new files, and pass those as parameters to a function to Unzip-process-save as CSV etc.
Is that the sort of thing you're after?
Yes @Anonymous - your suggested process would seem to fit, however I am concerned about the maintainability and portability given the number of steps and reliance on coming in and out of R scripts.
One of the original intents of the question was to draw an indication from the Power BI / Power Query development team if this something that is on the radar.
As public data sets are becoming increasing large and available on line as zip files, the ability to connect to a web source, download, unzip and then process without specialist coding in Power BI would be a distinct product advantage. There appear to be many pieces of the puzzle already, so joining the dots would be a fantastic improvement !
Maybe this could go to the ideas forum / group ??
I too would be intrigued to see what feedback you get from the Dev team on this. There are at least 3 entries on this already in the ideas forum - see https://ideas.powerbi.com/search?filter=ideas&query=zip%20file
March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount!
Arun Ulag shares exciting details about the Microsoft Fabric Conference 2025, which will be held in Las Vegas, NV.