Get the csv zipped file from a URL

Alirezam · ‎09-06-2020

Hi,

I came across a website providing the zipped csv file wanna get the csv into the Power Bi. Any indication? I need to get only the file in the top first row every day.

lbendlin · ‎09-07-2020

Inspecting the HTML shows that these ZIP files are coded by day

So you could either assume that "yesterday's file" is available when you poll it, or run a screen scraper.

Assumptions are boring so lets go with the second approach. Power Query is reasonably adept at reading tables in HTML pages and actually works well with your URL.

let
    Source = Web.Page(Web.Contents("https://www.emi.ea.govt.nz/Wholesale/Datasets/RealTimePrices/2020")),
    Data0 = Source{0}[Data],
    ZipFile = Web.Contents("https://www.emi.ea.govt.nz/Wholesale/Datasets/RealTimePrices/2020/" & Data0[Name]{0}),
    ZipData = Unzip(ZipFile),
    #"Imported CSV" = Csv.Document(ZipData,[Delimiter=",", Columns=9, Encoding=1252, QuoteStyle=QuoteStyle.None])
in
    #"Imported CSV"

The Unzip function can be whatever your preference is, if you want to take the one from my blog you need to amend it slightly.

// expects binary of the ZIP file, only extracts the first data file after getting its size from the central directory
// https://en.wikipedia.org/wiki/Zip_(file_format)#Structure
(ZIPFile) =>
let
    //read the entire ZIP file into memory - we'll use it often so this is worth it
    Source = Binary.Buffer(ZIPFile),
    // get the full size of the ZIP file
    Size = Binary.Length(Source),
    //Find the start of the central directory at the sixth to last byte
    Directory = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Size-6), 
                    Start=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian)
            ]) ,
    Start = Directory(Source)[Start],
    //find the first entry in the directory and get the compressed file size
    FirstDirectoryEntry = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Start+20), 
                    FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
                    UnCompressedFileSize=BinaryFormat.Binary(4),
                    FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
                    ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
            ]) ,
    //figure out where the raw data starts            
    Offset = 30+FirstDirectoryEntry(Source)[FileNameLen]+FirstDirectoryEntry(Source)[ExtrasLen],
    Compressed = FirstDirectoryEntry(Source)[FileSize]+1,
    //get the raw data of the compressed file
    Raw = BinaryFormat.Record([
                    Header=BinaryFormat.Binary(Offset), 
                    Data=BinaryFormat.Binary(Compressed)
            ]) 
    // unzip it
in 
    Binary.Decompress(Raw(Source)[Data], Compression.Deflate)

And here's the result: (note the column headers are not part of the data, you need to supply that yourself).

View solution in original post

lbendlin · ‎09-07-2020

Inspecting the HTML shows that these ZIP files are coded by day

So you could either assume that "yesterday's file" is available when you poll it, or run a screen scraper.

Assumptions are boring so lets go with the second approach. Power Query is reasonably adept at reading tables in HTML pages and actually works well with your URL.

let
    Source = Web.Page(Web.Contents("https://www.emi.ea.govt.nz/Wholesale/Datasets/RealTimePrices/2020")),
    Data0 = Source{0}[Data],
    ZipFile = Web.Contents("https://www.emi.ea.govt.nz/Wholesale/Datasets/RealTimePrices/2020/" & Data0[Name]{0}),
    ZipData = Unzip(ZipFile),
    #"Imported CSV" = Csv.Document(ZipData,[Delimiter=",", Columns=9, Encoding=1252, QuoteStyle=QuoteStyle.None])
in
    #"Imported CSV"

The Unzip function can be whatever your preference is, if you want to take the one from my blog you need to amend it slightly.

// expects binary of the ZIP file, only extracts the first data file after getting its size from the central directory
// https://en.wikipedia.org/wiki/Zip_(file_format)#Structure
(ZIPFile) =>
let
    //read the entire ZIP file into memory - we'll use it often so this is worth it
    Source = Binary.Buffer(ZIPFile),
    // get the full size of the ZIP file
    Size = Binary.Length(Source),
    //Find the start of the central directory at the sixth to last byte
    Directory = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Size-6), 
                    Start=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian)
            ]) ,
    Start = Directory(Source)[Start],
    //find the first entry in the directory and get the compressed file size
    FirstDirectoryEntry = BinaryFormat.Record([ 
                    MiscHeader=BinaryFormat.Binary(Start+20), 
                    FileSize=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
                    UnCompressedFileSize=BinaryFormat.Binary(4),
                    FileNameLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
                    ExtrasLen=BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
            ]) ,
    //figure out where the raw data starts            
    Offset = 30+FirstDirectoryEntry(Source)[FileNameLen]+FirstDirectoryEntry(Source)[ExtrasLen],
    Compressed = FirstDirectoryEntry(Source)[FileSize]+1,
    //get the raw data of the compressed file
    Raw = BinaryFormat.Record([
                    Header=BinaryFormat.Binary(Offset), 
                    Data=BinaryFormat.Binary(Compressed)
            ]) 
    // unzip it
in 
    Binary.Decompress(Raw(Source)[Data], Compression.Deflate)

And here's the result: (note the column headers are not part of the data, you need to supply that yourself).

Alirezam · ‎09-28-2020

Thanks for your great help. I was away so sorry for belated reply. I applied the code you sent (the first short part) in the blank query and I got few errors as attached. I could not get through. and what I see is the list of zipped file names not their contents:

lbendlin · ‎09-28-2020

you will need the Unzip function as well - that's the code in the second listing.

Greg_Deckler · ‎09-06-2020

@Alirezam Check out this blog article by @lbendlin , it seems to be exactly what you are looking for:

https://community.powerbi.com/t5/Community-Blog/Working-With-Zip-Files-in-Power-Query/ba-p/1190186

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: Power BI Cookbook Third Edition (Color)

DAX is easy, CALCULATE makes DAX hard...

amitchandak · ‎09-06-2020

@Alirezam , check if these can help

https://www.youtube.com/watch?v=OzQ44gwi5Kw

https://www.youtube.com/watch?v=1yDYzxtYYZ8

Full Power BI Video 20 Hours YouTube
Microsoft Fabric Series 60+ Videos YouTube
Microsoft Fabric Hindi End to End YouTube

Get the csv zipped file from a URL