Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now! Learn more
Dear Community,
I hope you’re doing well. I have a question regarding dealing with zip file in Power Query.
We used M Code from (Mark White's blog) Reading Zip files in PowerQuery / M to deal with zip file by unzipping it. As you can see in the image below, most of zip files are unzipped properly. Based on the information from “Content” field, we know that the file is valid (the size is visible). On top of that, the “TXT” file is treated as CSV.
However, when we put some new zip files, we can’t see size information and the TXT file is not treated as CSV.
Do you know what cause this issue and how to deal with this? FYI, I checked all those problematic zip files and the TXT file inside are not corrupted or something. I also attached my M Code in the TXT file below, please kindly find the attachment.
https://drive.google.com/file/d/1almJuZhsXxTy3pCzeeZ8mh7jiDP4GI86/view?usp=sharing
If you can't open that, I can provide the M Code here:
------
let
// Load the ZIP file from SharePoint
Source = Source,
#"UniqueZipFiles" = Table.Distinct(Source, {"Name"}),
// Define the custom function to unzip a file
UnzipFile = (ZIPFile) =>
let
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileSize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileNameLen = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
ExtrasLen = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
]),
HeaderChoice = BinaryFormat.Choice(
BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
each if _ <> 67324752 // not the IsValid number? then return a dummy formatter
then BinaryFormat.Record([IsValid = false, Filename = null, Content = null])
else BinaryFormat.Choice(
BinaryFormat.Binary(26), // Header payload - 14+4+4+2+2
each BinaryFormat.Record([
IsValid = true,
Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
Extras = BinaryFormat.Text(Header(_)[ExtrasLen]),
Content = BinaryFormat.Transform(
BinaryFormat.Binary(Header(_)[BinarySize]),
(x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
)
]),
type binary // enable streaming
)
),
ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
Entries = List.Transform(
List.RemoveLastN(ZipFormat(ZIPFile), 1),
(e) => [FileName = e[Filename], Content = e[Content]]
)
in
Table.FromRecords(Entries),
// Apply the custom function to each ZIP file and filter the results
ExtractedTables = Table.AddColumn(UniqueZipFiles, "ExtractedContent", each UnzipFile([Content])),
RemovedContentColumn = Table.RemoveColumns(ExtractedTables, {"Content"}),
ExpandedTables = Table.ExpandTableColumn(RemovedContentColumn, "ExtractedContent", {"FileName", "Content"}),
FilteredTables = Table.SelectRows(ExpandedTables, each Text.Contains([FileName], "_SELLOUT_")),
// Keep only the "FileName" and "Content" columns
FinalTable = Table.SelectColumns(FilteredTables, {"FileName", "Content"})
in
FinalTable
------
Really appreciate your support on this. Thank you!
Solved! Go to Solution.
Probably the issue you are running into is that the function from that blog relies on the local file header, but often zips in the wild don't properly populate the local file header and you instead need to use the central directory file header.
An updated function was posted a while ago here: https://community.fabric.microsoft.com/t5/Power-Query/How-to-connect-Azure-DevOps-REST-API-in-to-pow...
This uses the CDFH and in my experience works much more consistently than the original you referenced.
Probably the issue you are running into is that the function from that blog relies on the local file header, but often zips in the wild don't properly populate the local file header and you instead need to use the central directory file header.
An updated function was posted a while ago here: https://community.fabric.microsoft.com/t5/Power-Query/How-to-connect-Azure-DevOps-REST-API-in-to-pow...
This uses the CDFH and in my experience works much more consistently than the original you referenced.
Hi Mark, thanks for your reply. May I know if you adjust that unzip function based on your need or use it as it is?
Actually in my case, I'm not using DevOps. I'm using SharePoint folder.
Thank you
I haven't needed to modify in any meaningful way for it to work with various zip files that have nothing to do with DevOps.
Thank you, Mark! It works! Have already accept your comment as a solution 🙂
Can you confirm that all the ZIP files are encoded in the same way and are the same format?
ZIP files can have different compression methods, have different headers, different meta data etc.
Basically any 2 ZIP files can be completely different.
Regards
Phil
Proud to be a Super User!
Hi Phil,
Thanks for your respond!
May I know how can I check the encoding of the all zip files?
May I know that those can affect to the behavior of Unzip M Code?
In those problematic Zips, we only have 1 TXT file inside it. While in the other zips that work properly, we have more than 1 TXT files.
Thank you!
The Power BI Data Visualization World Championships is back! Get ahead of the game and start preparing now!
| User | Count |
|---|---|
| 19 | |
| 9 | |
| 8 | |
| 7 | |
| 6 |