Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more

Reply
TotalAnonymous
Helper III
Helper III

Unzipping File - Power Query

Dear Community,

I hope you’re doing well. I have a question regarding dealing with zip file in Power Query.

We used M Code from (Mark White's blog) Reading Zip files in PowerQuery / M  to deal with zip file by unzipping it. As you can see in the image below, most of zip files are unzipped properly. Based on the information from “Content” field, we know that the file is valid (the size is visible). On top of that, the “TXT” file is treated as CSV.

TotalAnonymous_0-1763430450829.png

 

However, when we put some new zip files, we can’t see size information and the TXT file is not treated as CSV.

TotalAnonymous_1-1763430490000.png

 


Do you know what cause this issue and how to deal with this? FYI, I checked all those problematic zip files and the TXT file inside are not corrupted or something. I also attached my M Code in the TXT file below, please kindly find the attachment.
https://drive.google.com/file/d/1almJuZhsXxTy3pCzeeZ8mh7jiDP4GI86/view?usp=sharing

If you can't open that, I can provide the M Code here:

------
let
// Load the ZIP file from SharePoint
Source = Source,
#"UniqueZipFiles" = Table.Distinct(Source, {"Name"}),
// Define the custom function to unzip a file
UnzipFile = (ZIPFile) =>
let
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileSize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileNameLen = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
ExtrasLen = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
]),
HeaderChoice = BinaryFormat.Choice(
BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
each if _ <> 67324752 // not the IsValid number? then return a dummy formatter
then BinaryFormat.Record([IsValid = false, Filename = null, Content = null])
else BinaryFormat.Choice(
BinaryFormat.Binary(26), // Header payload - 14+4+4+2+2
each BinaryFormat.Record([
IsValid = true,
Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
Extras = BinaryFormat.Text(Header(_)[ExtrasLen]),
Content = BinaryFormat.Transform(
BinaryFormat.Binary(Header(_)[BinarySize]),
(x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
)
]),
type binary // enable streaming
)
),
ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
Entries = List.Transform(
List.RemoveLastN(ZipFormat(ZIPFile), 1),
(e) => [FileName = e[Filename], Content = e[Content]]
)
in
Table.FromRecords(Entries),

// Apply the custom function to each ZIP file and filter the results
ExtractedTables = Table.AddColumn(UniqueZipFiles, "ExtractedContent", each UnzipFile([Content])),
RemovedContentColumn = Table.RemoveColumns(ExtractedTables, {"Content"}),
ExpandedTables = Table.ExpandTableColumn(RemovedContentColumn, "ExtractedContent", {"FileName", "Content"}),
FilteredTables = Table.SelectRows(ExpandedTables, each Text.Contains([FileName], "_SELLOUT_")),
// Keep only the "FileName" and "Content" columns
FinalTable = Table.SelectColumns(FilteredTables, {"FileName", "Content"})
in
FinalTable

------

Really appreciate your support on this. Thank you!


1 ACCEPTED SOLUTION
MarkLaf
Super User
Super User

Probably the issue you are running into is that the function from that blog relies on the local file header, but often zips in the wild don't properly populate the local file header and you instead need to use the central directory file header.

 

An updated function was posted a while ago here: https://community.fabric.microsoft.com/t5/Power-Query/How-to-connect-Azure-DevOps-REST-API-in-to-pow...

 

This uses the CDFH and in my experience works much more consistently than the original you referenced.

View solution in original post

6 REPLIES 6
MarkLaf
Super User
Super User

Probably the issue you are running into is that the function from that blog relies on the local file header, but often zips in the wild don't properly populate the local file header and you instead need to use the central directory file header.

 

An updated function was posted a while ago here: https://community.fabric.microsoft.com/t5/Power-Query/How-to-connect-Azure-DevOps-REST-API-in-to-pow...

 

This uses the CDFH and in my experience works much more consistently than the original you referenced.

Hi Mark, thanks for your reply. May I know if you adjust that unzip function based on your need or use it as it is? 

Actually in my case, I'm not using DevOps. I'm using SharePoint folder. 

Thank you

I haven't needed to modify in any meaningful way for it to work with various zip files that have nothing to do with DevOps.

Thank you, Mark! It works! Have already accept your comment as a solution 🙂

PhilipTreacy
Super User
Super User

@TotalAnonymous 

 

Can you confirm that all the ZIP files are encoded in the same way and are the same format?

 

ZIP files can have different compression methods, have different headers, different meta data etc.

 

Basically any 2 ZIP files can be completely different.

 

Regards

 

Phil

 



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


Hi Phil,
Thanks for your respond!

May I know how can I check the encoding of the all zip files?

May I know that those can affect to the behavior of Unzip M Code?

In those problematic Zips, we only have 1 TXT file inside it. While in the other zips that work properly, we have more than 1 TXT files.

Thank you!

Helpful resources

Announcements
Power BI DataViz World Championships

Power BI Dataviz World Championships

The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!

December 2025 Power BI Update Carousel

Power BI Monthly Update - December 2025

Check out the December 2025 Power BI Holiday Recap!

FabCon Atlanta 2026 carousel

FabCon Atlanta 2026

Join us at FabCon Atlanta, March 16-20, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.

Top Solution Authors