Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
pat_energetics
Advocate II
Advocate II

Download and unzip multiple files from URL then query csv file/s within

Hi all,

 

I am trying to automate a process that involves;

1 - File/s download from given fixed URL,

2 - Unzip contents,

3 - Query contained CSV file/s.

 

The URL remains fixed, but files available for download are updated at regular frequency and so do not parmeterize easily using - for example - steps discussed here by Chris Webb and others. Ideally I would like to have a table of downloaded file names to avoid downloading the same file multiple times. Having a local copy of the files downloaded would be an advantage in this case.

 

If zip files were downloaded to a local directory, there seem to be a few different approaches to unzipping contents using either R script here from @prathy or a Power Query M language approach discussed here at Mark White's sql10 blogspot.

 

The last step has been completed using power query within Power BI, but currently relies on a manual process to download and unzip contents to a local directory.

 

I would be very interested to know anyone's experience with this.

 

Questions

Is the R script or the Power Query M the best approach for now given the rate at which Power BI data connectors are being developed ?

Is a seamless download , unzip and query all within Power BI too far away ?

5 REPLIES 5

Is the URL a folder that contains the zipped files?

Yes - an example of one URL is here.

Anonymous
Not applicable

If you open that URL as a web page in Power Query then drill down through the tables, you can get a list of ZIP files:

let
    Source = Web.Page(Web.Contents("http://www.nemweb.com.au/REPORTS/CURRENT/Daily_Reports/")),
    Data0 = Source{0}[Data],
    Children = Data0{0}[Children],
    Children1 = Children{1}[Children],
    Children2 = Children1{2}[Children],
    #"Removed Other Columns" = Table.SelectColumns(Children2,{"Children"}),
    #"Expanded Children" = Table.ExpandTableColumn(#"Removed Other Columns", "Children", {"Text"}, {"Children.Text"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded Children", each ([Children.Text] <> null and [Children.Text] <> "[To Parent Directory]"))
in
    #"Filtered Rows"

You could save that as a CSV each day with the R script you mention, then join with the result above the following day to find any new files, and pass those as parameters to a function to Unzip-process-save as CSV etc.

 

Is that the sort of thing you're after?

Yes @Anonymous - your suggested process would seem to fit, however I am concerned about the maintainability and portability given the number of steps and reliance on coming in and out of R scripts.

 

One of the original intents of the question was to draw an indication from the Power BI  / Power Query development team if this something that is on the radar. 

 

As public data sets are becoming increasing large and available on line as zip files, the ability to connect to a web source, download, unzip and then process without specialist coding in Power BI would be a distinct product advantage. There appear to be many pieces of the puzzle already, so joining the dots would be a fantastic improvement !

 

Maybe this could go to the ideas forum / group ??

Anonymous
Not applicable

I too would be intrigued to see what feedback you get from the Dev team on this.  There are at least 3 entries on this already in the ideas forum - see https://ideas.powerbi.com/search?filter=ideas&query=zip%20file

 

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors