Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Get certified in Microsoft Fabric—for free! For a limited time, get a free DP-600 exam voucher to use by the end of 2024. Register now

Reply
jerald
Frequent Visitor

Decompress and load multiple .gz files from multiple folders

I was able to decompress a single .gz file with a single csv in it by following Chris Webb's blog.

 

Ive also seen these blogs but was not able to successfully load and decompress files:

 

http://sql10.blogspot.sg/2016/06/reading-zip-files-in-powerquery-m.html
http://www.excelandpowerbi.com/?p=155

 

 

basically, my folder structure is like this

 

Parent_folder

  • Folder_A
    • Folder_A_1
      • file.gz
    • Folder_A_2
      • file.gz

 

 

1 ACCEPTED SOLUTION
hugoberry
Responsive Resident
Responsive Resident

It would be useful to get a peek at one of the files that you are trying to unzip.

 

Meanwhile you can check that you use the right compression on the files. The MSDN article  mentions two compression mechanisms. Have you tried both?

Compression.GZip
Compression.Deflate

 

Chris in his blog refers more to GZip, whilst Ken and Mark refer both to Deflate.

Another thing to mention. GZip will be compressing only single file, while ZIP archives would embed entire folder structures. What's your case?

View solution in original post

11 REPLIES 11
Ansari010
New Member

R Script is an another option to load .gz file in Power BI Desktop. Below is the code snippet.

 

install.packages -> data.table
library('data.table')
data <- fread(file="C:/Test/Employees.csv.gz")
data

See more details using this link

I know this is an old post but it can help someone with other options.

hugoberry
Responsive Resident
Responsive Resident

It would be useful to get a peek at one of the files that you are trying to unzip.

 

Meanwhile you can check that you use the right compression on the files. The MSDN article  mentions two compression mechanisms. Have you tried both?

Compression.GZip
Compression.Deflate

 

Chris in his blog refers more to GZip, whilst Ken and Mark refer both to Deflate.

Another thing to mention. GZip will be compressing only single file, while ZIP archives would embed entire folder structures. What's your case?

Thank you so much! I just followed followed the instruction in one of the comment from Chris' blog.  Then invoked it.


(zip) =>
let
    Source = 
		Binary.Decompress(
			File.Contents(
				zip),
				Compression.GZip
		),
    #"Imported" = 
		Csv.Document(
			Source,
			[Delimiter=":", Columns=2, Encoding=1252])
in
    #"Imported"

 

Hi Jerald,

 

When invoking the function, did you select Content as your zip field?  I'm doing that and running getting an error that reads"
An error occurred in the "" query.  Expression.Error: We cannot convert a value of type Binary to type Text. Details: Value = Binary Type=Type
Did you run into this?  Any suggestions on how to get around it?

 

Thank you,

Katie

Anonymous
Not applicable

Same issue as Katie.

 

I have this function that works:

 

(zip) =>
let
    Source = 
		Binary.Decompress(
			File.Contents(
				zip),
				Compression.GZip
		),
    #"Imported" = 
		Csv.Document(
			Source,
			[Delimiter="|", Columns=30, Encoding=1252])
in
    #"Imported"

Tested it here to make sure. I get 30 columns of text extracted.

 

let
    Source = UnpackGzip("C:\Users\Michael\CPU-1\disk\var\log\app.20180418002723.log.gz")
in
    Source

 

Great.

 

But when I try to use on a batch of files like this, I get an error on the Table.Add Column saying it cant convert Binary to text. There are 120 .gz files in the folder.

 

let
    path = "C:\Users\Michael\CPU-1\disk\var\log\",
    Source = Folder.Files(path), 
    #"Filtered Rows to only .gz" = Table.SelectRows(Source, each ([Extension] = ".gz")),
    #"Added Custom UnzipContents" = Table.AddColumn(#"Filtered Rows to only .gz", "Custom", each UnpackGzip([Content])),
    #"Expanded Attributes" = Table.ExpandRecordColumn(#"Added Custom UnzipContents", "Attributes", {"Content Type", "Kind", "Size"}, {"Attributes.Content Type", "Attributes.Kind", "Attributes.Size"})
in
    #"Expanded Attributes"

 

Michael

Katie and Michael,

 

Did you ever get this approach to work, I too am attempting to use the provided function and getting the error in regards to not being able to convert binary into text.

 

Tom

Anonymous
Not applicable

First create a function: I called it: fnUnpackGzipLinesFromBinary
 
(zip) =>
let
    Source = 
		Binary.Decompress(
			File.Contents(
				zip),
				Compression.GZip
		),
    #"Imported" = 
		Table.FromColumns({Lines.FromBinary(Source,null,null,1252)})
in
    #"Imported"
Next, create this query based on folder source that points to your folder and runs the unzip commmand on each row it finds.
The sorting is unnecesary in the code. However, I apply a filter to see only files that have a .gz extension to avoid errors.
let
        path = "C:\Users\Michael\CPU-1\disk\var\log\",
        Source = Folder.Files(path),
    #"Expanded Attributes1" = Table.ExpandRecordColumn(Source, "Attributes", {"Size"}, {"Attributes.Size"}),
    #"Sorted Rows" = Table.Sort(#"Expanded Attributes1",{{"Attributes.Size", Order.Ascending}}),
    #"Added Custom" = Table.AddColumn(#"Sorted Rows", "LogFile", each Text.Start([Name],3)),
    FilteredGzRows = Table.SelectRows(#"Added Custom", each ([Extension] = ".gz")),
    UnpackGzips = Table.AddColumn(FilteredGzRows, "Custom", each fnUnpackGzipLinesFromBinary([Folder Path]&[Name])),
in
    UnpackGzips

@Anonymous thanks for the code.  I realise where I had been going wrong, I had been trying to call the fnUnpackGzipLinesFromBinary on the Contents column, rather than the file directly itself.  I used a slightly different function that will parse the CSV.

 

For anyone looking for this in the future, create a function "fnUnpackGzipLinesFromBinary":

 

 

(zip) =>
let
Source =
        Binary.Decompress(
            File.Contents(
                zip),
                Compression.GZip
        ),
#"Imported" =
        Csv.Document(Source,[Delimiter=",", Columns=28, Encoding=1252, QuoteStyle=QuoteStyle.None]),

#"Promoted Headers" = Table.PromoteHeaders(#"Imported", [PromoteAllScalars=true])
in
#"Promoted Headers"

 

Then call the function from your load scripts:

let
    Source = Folder.Files("C:\Users\user\files"),
    #"Filtered Rows" = Table.SelectRows(Source, each ([Extension] = ".gz")),
    UnpackGzips = Table.AddColumn(#"Filtered Rows", "Custom", each fnUnpackGzipLinesFromBinary([Folder Path]&[Name])),
    #"Expanded Custom" = Table.ExpandTableColumn(UnpackGzips, ...insert column list here...)
in
 #"Expanded Custom"

 

Anonymous
Not applicable

So I was able to make this work for me.  

 

let
Source = Folder.Files("\\blahblah.com\Canned\DNIS_Number\MonthlyGZ"),
UnpackGzips = Table.AddColumn(Source, "Custom", each fnUnpackGzipLinesFromBinary([Folder Path]&[Name]))

It works great, from my laptop.  I refreshed 2 .gz files from the folder path I listed.  Great.  I published the pbix report to my workspace, went to refresh, error.  Go to the settings and I see this.

mg761w_0-1677600403862.png

 

I guess since I'm using the function fnUnpackGzipLinesFromBinary to get the folder name and file name it's dynamic and PBI cannot do that.  I can refresh all day from PBI desktop but not when I publish it to a workspace.

mg761w_0-1678137802006.png

 

 


Data refresh in Power BI - Power BI | Microsoft Learn
dynamic data source is a data source in which some or all of the information required to connect can't be determined until Power Query runs its query, because the data is generated in code or returned from another data source. Examples include: the instance name and database of a SQL Server database; the path of a CSV file; or the URL of a web service.

If you or anyone out there has some great work around to this please let me know!

I have tried these codes and they did not work for me. This is what I ended up: 

Any idea?

 

Alirezam_0-1624928704291.png

 

Anonymous
Not applicable

You need to go to the last column called Custom that was created from the UnpackGzips step to combine the files.  Custom is where the function is called and it will unpack the gzip files.

mg761w_1-1677602930881.png

 

Helpful resources

Announcements
November Carousel

Fabric Community Update - November 2024

Find out what's new and trending in the Fabric Community.

Live Sessions with Fabric DB

Be one of the first to start using Fabric Databases

Starting December 3, join live sessions with database experts and the Fabric product team to learn just how easy it is to get started.

Las Vegas 2025

Join us at the Microsoft Fabric Community Conference

March 31 - April 2, 2025, in Las Vegas, Nevada. Use code MSCUST for a $150 discount! Early Bird pricing ends December 9th.

Nov PBI Update Carousel

Power BI Monthly Update - November 2024

Check out the November 2024 Power BI update to learn about new features.