Reply
Alirezam
Helper V
Helper V
Partially syndicated - Outbound

Get the csv zipped file from a URL

Hi,

 

I came across a website providing the zipped csv file wanna get the csv into the Power Bi. Any indication? I need to get only the file in the top first row every day.

dataset.PNG

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

Syndicated - Outbound

Inspecting the HTML shows that these ZIP files are coded by day

lbendlin_0-1599481242624.png

 

So you could either assume that "yesterday's file"  is available when you poll it, or run a screen scraper.

 

Assumptions are boring so lets go with the second approach.  Power Query is reasonably adept at reading tables in HTML pages and actually works well with your URL.

 

let
    Source = Web.Page(Web.Contents("https://www.emi.ea.govt.nz/Wholesale/Datasets/RealTimePrices/2020")),
    Data0 = Source{0}[Data],
    ZipFile = Web.Contents("https://www.emi.ea.govt.nz/Wholesale/Datasets/RealTimePrices/2020/" & Data0[Name]{0}),
    ZipData = Unzip(ZipFile),
    #"Imported CSV" = Csv.Document(ZipData,[Delimiter=",", Columns=9, Encoding=1252, QuoteStyle=QuoteStyle.None])
in
    #"Imported CSV"

 

The Unzip function can be whatever your preference is, if you want to take the one from my blog you need to amend it slightly.

 

// expects binary of the ZIP file, only extracts the first data file after getting its size from the central directory
// https://en.wikipedia.org/wiki/Zip_(file_format)#Structure
(ZIPFile) =>
let
    //read the entire ZIP file into memory - we'll use it often so this is worth it
    Source = Binary.Buffer(ZIPFile),
    // get the full size of the ZIP file
    Size = Binary.Length(Source),
    //Find the start of the central directory at the sixth to last byte
    Directory = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Size-6), 
                    Start=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian)
            ]) ,
    Start = Directory(Source)[Start],
    //find the first entry in the directory and get the compressed file size
    FirstDirectoryEntry = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Start+20), 
                    FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
                    UnCompressedFileSize=BinaryFormat.Binary(4),
                    FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
                    ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
            ]) ,
    //figure out where the raw data starts            
    Offset = 30+FirstDirectoryEntry(Source)[FileNameLen]+FirstDirectoryEntry(Source)[ExtrasLen],
    Compressed = FirstDirectoryEntry(Source)[FileSize]+1,
    //get the raw data of the compressed file
    Raw = BinaryFormat.Record([
                    Header=BinaryFormat.Binary(Offset), 
                    Data=BinaryFormat.Binary(Compressed)
            ]) 
    // unzip it
in 
    Binary.Decompress(Raw(Source)[Data], Compression.Deflate)

 

And here's the result:  (note the column headers are not part of the data, you need to supply that yourself).

 

lbendlin_1-1599482764721.png

 

 

View solution in original post

5 REPLIES 5
lbendlin
Super User
Super User

Syndicated - Outbound

Inspecting the HTML shows that these ZIP files are coded by day

lbendlin_0-1599481242624.png

 

So you could either assume that "yesterday's file"  is available when you poll it, or run a screen scraper.

 

Assumptions are boring so lets go with the second approach.  Power Query is reasonably adept at reading tables in HTML pages and actually works well with your URL.

 

let
    Source = Web.Page(Web.Contents("https://www.emi.ea.govt.nz/Wholesale/Datasets/RealTimePrices/2020")),
    Data0 = Source{0}[Data],
    ZipFile = Web.Contents("https://www.emi.ea.govt.nz/Wholesale/Datasets/RealTimePrices/2020/" & Data0[Name]{0}),
    ZipData = Unzip(ZipFile),
    #"Imported CSV" = Csv.Document(ZipData,[Delimiter=",", Columns=9, Encoding=1252, QuoteStyle=QuoteStyle.None])
in
    #"Imported CSV"

 

The Unzip function can be whatever your preference is, if you want to take the one from my blog you need to amend it slightly.

 

// expects binary of the ZIP file, only extracts the first data file after getting its size from the central directory
// https://en.wikipedia.org/wiki/Zip_(file_format)#Structure
(ZIPFile) =>
let
    //read the entire ZIP file into memory - we'll use it often so this is worth it
    Source = Binary.Buffer(ZIPFile),
    // get the full size of the ZIP file
    Size = Binary.Length(Source),
    //Find the start of the central directory at the sixth to last byte
    Directory = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Size-6), 
                    Start=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian)
            ]) ,
    Start = Directory(Source)[Start],
    //find the first entry in the directory and get the compressed file size
    FirstDirectoryEntry = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Start+20), 
                    FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
                    UnCompressedFileSize=BinaryFormat.Binary(4),
                    FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
                    ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
            ]) ,
    //figure out where the raw data starts            
    Offset = 30+FirstDirectoryEntry(Source)[FileNameLen]+FirstDirectoryEntry(Source)[ExtrasLen],
    Compressed = FirstDirectoryEntry(Source)[FileSize]+1,
    //get the raw data of the compressed file
    Raw = BinaryFormat.Record([
                    Header=BinaryFormat.Binary(Offset), 
                    Data=BinaryFormat.Binary(Compressed)
            ]) 
    // unzip it
in 
    Binary.Decompress(Raw(Source)[Data], Compression.Deflate)

 

And here's the result:  (note the column headers are not part of the data, you need to supply that yourself).

 

lbendlin_1-1599482764721.png

 

 

Syndicated - Outbound

Thanks for your great help. I was away so sorry for belated reply. I applied the code you sent (the first short part) in the blank query and I got few errors as attached. I could not 1.PNG2.PNG3.PNGget through. and what I see is the list of zipped file names not their contents:

 

Syndicated - Outbound

you will need the Unzip function as well - that's the code in the second listing.

Greg_Deckler
Super User
Super User

Syndicated - Outbound

@Alirezam Check out this blog article by @lbendlin , it seems to be exactly what you are looking for:

https://community.powerbi.com/t5/Community-Blog/Working-With-Zip-Files-in-Power-Query/ba-p/1190186

 



Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...
amitchandak
Super User
Super User

Syndicated - Outbound

@Alirezam , check if these can help

https://www.youtube.com/watch?v=OzQ44gwi5Kw

https://www.youtube.com/watch?v=1yDYzxtYYZ8

Full Power BI Video 20 Hours YouTube
Microsoft Fabric Series 60+ Videos YouTube
Microsoft Fabric Hindi End to End YouTube
avatar user

Helpful resources

Announcements
Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code FABINSIDER for a $400 discount!

FebPBI_Carousel

Power BI Monthly Update - February 2025

Check out the February 2025 Power BI update to learn about new features.

March2025 Carousel

Fabric Community Update - March 2025

Find out what's new and trending in the Fabric community.

Top Solution Authors (Last Month)
Top Kudoed Authors (Last Month)