Join us at FabCon Atlanta from March 16 - 20, 2026, for the ultimate Fabric, Power BI, AI and SQL community-led event. Save $200 with code FABCOMM.
Register now!Join the Fabric FabCon Global Hackathon—running virtually through Nov 3. Open to all skill levels. $10,000 in prizes! Register now.
Hi all
I am looking to extract data from multiple ZIP files all stored in the same folder. REsearching online I I was able to find Mark White's function that should enable the extraction of csv or txt files from any zip: Reading Zip files in PowerQuery / M (sql10.blogspot.com)
Upon testing with my files, for some reason I only get one file returned from the function and also I am unable to extract its content.
The zip files has a folder structure, meaning the actual txt files are nexted within folders. But as per my knowledeg that should not cause an issue to the function.
Anyone have an idea?
For reference I've linked one of the ZIPs: Link to the ZIP
And below you can find the function:
(ZIPFile) =>
let
Header = BinaryFormat.Record([
MiscHeader = BinaryFormat.Binary(14),
BinarySize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileSize = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
FileNameLen= BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
ExtrasLen = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian)
]),
HeaderChoice = BinaryFormat.Choice(
BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
each if _ <> 67324752 // not the IsValid number? then return a dummy formatter
then BinaryFormat.Record([IsValid = false, Filename=null, Content=null])
else BinaryFormat.Choice(
BinaryFormat.Binary(26), // Header payload - 14+4+4+2+2
each BinaryFormat.Record([
IsValid = true,
Filename = BinaryFormat.Text(Header(_)[FileNameLen]),
Extras = BinaryFormat.Text(Header(_)[ExtrasLen]),
Content = BinaryFormat.Transform(
BinaryFormat.Binary(Header(_)[BinarySize]),
(x) => try Binary.Buffer(Binary.Decompress(x, Compression.Deflate)) otherwise null
)
]),
type binary // enable streaming
)
),
ZipFormat = BinaryFormat.List(HeaderChoice, each _[IsValid] = true),
Entries = List.Transform(
List.RemoveLastN( ZipFormat(ZIPFile), 1),
(e) => [FileName = e[Filename], Content = e[Content] ]
)
in
Table.FromRecords(Entries)
Solved! Go to Solution.
Hi @Data-Crunch
The below post by @artemus contains a function that appears to work with ZIP files containing multiple files:
It returns a list of records, each of which contains fields FileName and Content.
I have pasted the function here for convenience:
// Unzip
(ZIPFile) =>
let
ushort = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
uint = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
EDOCfn = BinaryFormat.Record([
ZipContent = BinaryFormat.Binary(Binary.Length(ZIPFile) - 22),
Magic = uint,
DiskNum = ushort,
CDirectoryDiskId = ushort,
CDirectoryRecordCountOnDisk = ushort,
CDirectoryRecordCount = ushort,
SizeOfCentralDirectory = uint,
CentralDirectoryOffset = uint,
CommendLength = ushort
]),
EDOC = EDOCfn(ZIPFile),
BeforeCentralDirectory = BinaryFormat.Binary(EDOC[CentralDirectoryOffset]),
CentralDirectory = BinaryFormat.Length(BinaryFormat.Record(
[
ZipContent = BeforeCentralDirectory,
Items = BinaryFormat.List(BinaryFormat.Record(
[
Magic = uint,
CurrentVersion = ushort,
MinVersion = ushort,
Flags = ushort,
CompressionMethod = ushort,
FileModificationTime = ushort,
FileModificationDate = ushort,
CRC32 = uint,
BinarySize = uint,
FileSize = uint,
FileInfo = BinaryFormat.Choice(
BinaryFormat.Record(
[
Len = ushort,
FieldsLen = ushort,
FileCommentLength = ushort,
Disk = ushort,
InternalFileAttr = ushort,
ExternalAttr = uint,
PosOfFileHeader = uint
]),
(fileInfo) => BinaryFormat.Record(
[
FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
Fields = BinaryFormat.Binary(fileInfo[FieldsLen]),
FileComment = BinaryFormat.Text(fileInfo[FileCommentLength], TextEncoding.Ascii),
Disk = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
InternalFileAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
ExternalAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[InternalFileAttr]),
PosOfFileHeader = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[PosOfFileHeader])
])
)
]),
EDOC[CDirectoryRecordCount]
)
]),
EDOC[CentralDirectoryOffset] + EDOC[SizeOfCentralDirectory]),
Contents = List.Transform(
CentralDirectory(ZIPFile)[Items],
(cdEntry) =>
let
ZipEntry = BinaryFormat.Record(
[
PreviousData = BinaryFormat.Binary(cdEntry[FileInfo][PosOfFileHeader]),
Magic = uint,
ZipVersion = ushort,
ZipFlags = ushort,
CompressionMethod = ushort,
FileModificationTime = ushort,
FileModificationDate = ushort,
CRC32 = uint,
BinarySize = uint,
FileSize = uint,
FileName = BinaryFormat.Choice(
BinaryFormat.Record(
[
Len = ushort,
FieldsLen = ushort
]),
(fileInfo) => BinaryFormat.Record(
[
FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
Fields = BinaryFormat.Binary(fileInfo[FieldsLen])
])
),
FileContent = BinaryFormat.Transform(
BinaryFormat.Binary(cdEntry[BinarySize]),
each Binary.Decompress(_, Compression.Deflate)
)
])(ZIPFile)
in
[FileName=ZipEntry[FileName][FileName], Content=ZipEntry[FileContent]]
)
in
Contents
Hi @Data-Crunch
The below post by @artemus contains a function that appears to work with ZIP files containing multiple files:
It returns a list of records, each of which contains fields FileName and Content.
I have pasted the function here for convenience:
// Unzip
(ZIPFile) =>
let
ushort = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger16, ByteOrder.LittleEndian),
uint = BinaryFormat.ByteOrder(BinaryFormat.UnsignedInteger32, ByteOrder.LittleEndian),
EDOCfn = BinaryFormat.Record([
ZipContent = BinaryFormat.Binary(Binary.Length(ZIPFile) - 22),
Magic = uint,
DiskNum = ushort,
CDirectoryDiskId = ushort,
CDirectoryRecordCountOnDisk = ushort,
CDirectoryRecordCount = ushort,
SizeOfCentralDirectory = uint,
CentralDirectoryOffset = uint,
CommendLength = ushort
]),
EDOC = EDOCfn(ZIPFile),
BeforeCentralDirectory = BinaryFormat.Binary(EDOC[CentralDirectoryOffset]),
CentralDirectory = BinaryFormat.Length(BinaryFormat.Record(
[
ZipContent = BeforeCentralDirectory,
Items = BinaryFormat.List(BinaryFormat.Record(
[
Magic = uint,
CurrentVersion = ushort,
MinVersion = ushort,
Flags = ushort,
CompressionMethod = ushort,
FileModificationTime = ushort,
FileModificationDate = ushort,
CRC32 = uint,
BinarySize = uint,
FileSize = uint,
FileInfo = BinaryFormat.Choice(
BinaryFormat.Record(
[
Len = ushort,
FieldsLen = ushort,
FileCommentLength = ushort,
Disk = ushort,
InternalFileAttr = ushort,
ExternalAttr = uint,
PosOfFileHeader = uint
]),
(fileInfo) => BinaryFormat.Record(
[
FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
Fields = BinaryFormat.Binary(fileInfo[FieldsLen]),
FileComment = BinaryFormat.Text(fileInfo[FileCommentLength], TextEncoding.Ascii),
Disk = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
InternalFileAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[Disk]),
ExternalAttr = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[InternalFileAttr]),
PosOfFileHeader = BinaryFormat.Transform(BinaryFormat.Null, each fileInfo[PosOfFileHeader])
])
)
]),
EDOC[CDirectoryRecordCount]
)
]),
EDOC[CentralDirectoryOffset] + EDOC[SizeOfCentralDirectory]),
Contents = List.Transform(
CentralDirectory(ZIPFile)[Items],
(cdEntry) =>
let
ZipEntry = BinaryFormat.Record(
[
PreviousData = BinaryFormat.Binary(cdEntry[FileInfo][PosOfFileHeader]),
Magic = uint,
ZipVersion = ushort,
ZipFlags = ushort,
CompressionMethod = ushort,
FileModificationTime = ushort,
FileModificationDate = ushort,
CRC32 = uint,
BinarySize = uint,
FileSize = uint,
FileName = BinaryFormat.Choice(
BinaryFormat.Record(
[
Len = ushort,
FieldsLen = ushort
]),
(fileInfo) => BinaryFormat.Record(
[
FileName = BinaryFormat.Text(fileInfo[Len], TextEncoding.Ascii),
Fields = BinaryFormat.Binary(fileInfo[FieldsLen])
])
),
FileContent = BinaryFormat.Transform(
BinaryFormat.Binary(cdEntry[BinarySize]),
each Binary.Decompress(_, Compression.Deflate)
)
])(ZIPFile)
in
[FileName=ZipEntry[FileName][FileName], Content=ZipEntry[FileContent]]
)
in
Contents
This one worked like a charm! Thanks!